Functions to read and write the VCF (Variant Call Format) format and BCF (its binary equivalent). See https://samtools.github.io/hts-specs/ for the detail VCF/BCF specifications.

(ns cljam.io.vcf
  (:refer-clojure :exclude [indexed?])
  (:require [clojure.java.io :as cio]
            [cljam.util :as util]
            [cljam.io.protocols :as protocols]
            [cljam.io.util :as io-util]
            [cljam.io.vcf.reader :as vcf-reader]
            [cljam.io.vcf.writer :as vcf-writer]
            [cljam.io.bcf.reader :as bcf-reader]
            [cljam.io.bcf.writer :as bcf-writer]
            [cljam.io.util.bgzf :as bgzf]
            [cljam.io.tabix :as tabix]
            [cljam.io.csi :as csi])
  (:import java.io.Closeable
           java.io.FileNotFoundException
           cljam.io.vcf.reader.VCFReader
           cljam.io.vcf.writer.VCFWriter
           cljam.io.bcf.reader.BCFReader
           cljam.io.bcf.writer.BCFWriter))

Reading

Returns an open cljam.io.vcf.reader.VCFReader of f. Should be used inside with-open to ensure the reader is properly closed.

(defn ^VCFReader vcf-reader
  [f]
  (let [meta-info (with-open [r (cio/reader (util/compressor-input-stream f))]
                    (vcf-reader/load-meta-info r))
        header (with-open [r (cio/reader (util/compressor-input-stream f))]
                 (vcf-reader/load-header r))]
    (VCFReader. (util/as-url f) meta-info header
                (if (bgzf/bgzip? f)
                  (bgzf/bgzf-input-stream f)
                  (cio/reader (util/compressor-input-stream f)))
                (delay (try (csi/read-index (str f ".csi"))
                            (catch FileNotFoundException _
                              (tabix/read-index (str f ".tbi"))))))))

Returns an open cljam.io.bcf.reader.BCFReader of f. Should be used inside with-open to ensure the reader is properly closed. Throws IOException if failed to parse BCF file format.

(defn ^BCFReader bcf-reader
  [f]
  (bcf-reader/reader f))

Clones vcf reader sharing persistent objects.

(defn ^VCFReader clone-vcf-reader
  [^VCFReader rdr]
  (let [url (.url rdr)
        input-stream (if (bgzf/bgzip? url)
                       (bgzf/bgzf-input-stream url)
                       (cio/reader (util/compressor-input-stream url)))]
    (VCFReader. url (.meta-info rdr) (.header rdr)
                input-stream
                (.index-delay rdr))))

Clones bcf reader sharing persistent objects.

(defn ^BCFReader clone-bcf-reader
  [^BCFReader rdr]
  (let [url (.url rdr)
        input-stream (bgzf/bgzf-input-stream url)]
    (BCFReader. (.url rdr) (.meta-info rdr) (.header rdr)
                input-stream (.start-pos rdr) (.index-delay rdr))))

Clones vcf/bcf reader sharing persistent objects.

(defn ^Closeable clone-reader
  [rdr]
  (cond
    (io-util/vcf-reader? rdr) (clone-vcf-reader rdr)
    (io-util/bcf-reader? rdr) (clone-bcf-reader rdr)
    :else (throw (IllegalArgumentException. "Invalid file type"))))

Selects suitable reader from f's extension, returning the open reader. This function supports VCF and BCF formats.

(defn ^Closeable reader
  [f]
  (if (or (io-util/vcf-reader? f)
          (io-util/bcf-reader? f))
    (clone-reader f)
    (case (try (io-util/file-type f)
               (catch IllegalArgumentException _
                 (io-util/file-type-from-contents f)))
      :vcf (vcf-reader f)
      :bcf (bcf-reader f)
      (throw (IllegalArgumentException. "Invalid file type")))))

Returns meta-info section of VCF/BCF file as a map.

(defn meta-info
  [rdr]
  (protocols/meta-info rdr))

Returns header of VCF/BCF file as a sequence of strings.

(defn header
  [rdr]
  (protocols/header rdr))

Returns true if the reader can be randomly accessed, false if not. Note this function immediately realizes a delayed index.

(defn indexed?
  [rdr]
  (protocols/indexed? rdr))

Reads variants of the VCF/BCF file, returning them as a lazy sequence. rdr must implement cljam.io.protocols/IVariantReader. Can take an option :depth to specify parsing level, default :deep. <:deep|:vcf|:bcf|:shallow|:raw> :deep - Fully parsed variant map. FORMAT, FILTER, INFO and samples columns are parsed. :vcf - VCF-style map. FORMAT, FILTER, INFO and samples columns are strings. :bcf - BCF-style map. CHROM, FILTER, INFO and :genotype contains indices to meta-info. :shallow - Only CHROM, POS and ref-length are parsed. :raw - Raw map of ByteBufers.

(defn read-variants
  ([rdr] (protocols/read-variants rdr))
  ([rdr option] (protocols/read-variants rdr option)))

Reads variants of the VCF/BCF file randomly, returning them as a lazy sequence.

(defn read-variants-randomly
  ([rdr span-option depth-option]
   (protocols/read-variants-randomly
    rdr
    span-option
    depth-option)))

Reads offsets {:file-beg :file-end :beg :end :chr } from VCF/BCF file.

(defn read-file-offsets
  [rdr] (protocols/read-file-offsets rdr))

Writing

Returns an open cljam.io.vcf.writer.VCFWriter of f. Meta-information lines and a header line will be written in this function. Should be used inside with-open to ensure the writer is properly closed. e.g.

(with-open [wtr (vcf-writer "out.vcf"
                            {:file-date "20090805", :source "myImpu..." ...}
                            ["CHROM" "POS" "ID" "REF" "ALT" ...])]
  (WRITING-VCF))
(defn ^VCFWriter vcf-writer
  [f meta-info header]
  (doto (VCFWriter. (util/as-url f)
                    (cio/writer (util/compressor-output-stream f))
                    meta-info
                    header)
    (vcf-writer/write-meta-info meta-info)
    (vcf-writer/write-header header)))

Returns an open cljam.io.bcf.writer.BCFWriter of f. Meta-information lines and a header line will be written in this function. Should be used inside with-open to ensure the writer is properly closed. e.g.

 (with-open [wtr (bcf-writer "out.bcf"
                             {:file-date "20090805", :source "myImpu..." ...}
                             ["CHROM" "POS" "ID" "REF" "ALT" ...])]
   (WRITING-BCF))
(defn ^BCFWriter bcf-writer
  [f meta-info header]
  (bcf-writer/writer f meta-info header))

Selects suitable writer from f's extension, returning the open writer. This function supports VCF and BCF formats.

(defn ^Closeable writer
  [f meta-info header]
  (case (io-util/file-type f)
    :vcf (vcf-writer f meta-info header)
    :bcf (bcf-writer f meta-info header)
    (throw (IllegalArgumentException. "Invalid file type"))))

Writes variants to the VCF/BCF file. wtr must implement cljam.io.protocols/IVariantWriter. variants must be a sequence of parsed or VCF-style maps. e.g.

(write-variants [{:chr "19", :pos 111, :id nil, :ref "A",
                  :alt ["C"], :qual 9.6, :filter [:PASS], :info {:DP 4},
                  :FORMAT [:GT :HQ] ...} ...])
(defn write-variants
  [wtr variants]
  (protocols/write-variants wtr variants))