Functions to read and write the VCF (Variant Call Format) format and BCF (its binary equivalent). See https://samtools.github.io/hts-specs/ for the detail VCF/BCF specifications. | (ns cljam.io.vcf (:refer-clojure :exclude [indexed?]) (:require [clojure.java.io :as cio] [cljam.util :as util] [cljam.io.protocols :as protocols] [cljam.io.util :as io-util] [cljam.io.vcf.reader :as vcf-reader] [cljam.io.vcf.writer :as vcf-writer] [cljam.io.bcf.reader :as bcf-reader] [cljam.io.bcf.writer :as bcf-writer] [cljam.io.util.bgzf :as bgzf] [cljam.io.tabix :as tabix] [cljam.io.csi :as csi]) (:import java.io.Closeable java.io.FileNotFoundException cljam.io.vcf.reader.VCFReader cljam.io.vcf.writer.VCFWriter cljam.io.bcf.reader.BCFReader cljam.io.bcf.writer.BCFWriter)) |
Reading | |
Returns an open cljam.io.vcf.reader.VCFReader of f. Should be used inside with-open to ensure the reader is properly closed. | (defn vcf-reader ^VCFReader [f] (let [meta-info (with-open [r (cio/reader (util/compressor-input-stream f))] (vcf-reader/load-meta-info r)) header (with-open [r (cio/reader (util/compressor-input-stream f))] (vcf-reader/load-header r))] (VCFReader. (util/as-url f) meta-info header (if (bgzf/bgzip? f) (bgzf/bgzf-input-stream f) (cio/reader (util/compressor-input-stream f))) (delay (try (csi/read-index (str f ".csi")) (catch FileNotFoundException _ (tabix/read-index (str f ".tbi")))))))) |
Returns an open cljam.io.bcf.reader.BCFReader of f. Should be used inside with-open to ensure the reader is properly closed. Throws IOException if failed to parse BCF file format. | (defn bcf-reader ^BCFReader [f] (bcf-reader/reader f)) |
Clones vcf reader sharing persistent objects. | (defn clone-vcf-reader ^VCFReader [^VCFReader rdr] (let [url (.url rdr) input-stream (if (bgzf/bgzip? url) (bgzf/bgzf-input-stream url) (cio/reader (util/compressor-input-stream url)))] (VCFReader. url (.meta-info rdr) (.header rdr) input-stream (.index-delay rdr)))) |
Clones bcf reader sharing persistent objects. | (defn clone-bcf-reader ^BCFReader [^BCFReader rdr] (let [url (.url rdr) input-stream (bgzf/bgzf-input-stream url)] (BCFReader. (.url rdr) (.meta-info rdr) (.header rdr) input-stream (.start-pos rdr) (.index-delay rdr)))) |
Clones vcf/bcf reader sharing persistent objects. | (defn clone-reader ^Closeable [rdr] (cond (io-util/vcf-reader? rdr) (clone-vcf-reader rdr) (io-util/bcf-reader? rdr) (clone-bcf-reader rdr) :else (throw (IllegalArgumentException. "Invalid file type")))) |
Selects suitable reader from f's extension, returning the open reader. This function supports VCF and BCF formats. | (defn reader ^Closeable [f] (if (or (io-util/vcf-reader? f) (io-util/bcf-reader? f)) (clone-reader f) (case (try (io-util/file-type f) (catch IllegalArgumentException _ (io-util/file-type-from-contents f))) :vcf (vcf-reader f) :bcf (bcf-reader f) (throw (IllegalArgumentException. "Invalid file type"))))) |
Returns meta-info section of VCF/BCF file as a map. | (defn meta-info [rdr] (protocols/meta-info rdr)) |
Returns header of VCF/BCF file as a sequence of strings. | (defn header [rdr] (protocols/header rdr)) |
Returns true if the reader can be randomly accessed, false if not. Note this function immediately realizes a delayed index. | (defn indexed? [rdr] (protocols/indexed? rdr)) |
Reads variants of the VCF/BCF file, returning them as a lazy sequence. rdr must implement cljam.io.protocols/IVariantReader. Can take an option :depth to specify parsing level, default :deep. <:deep|:vcf|:bcf|:shallow|:raw> :deep - Fully parsed variant map. FORMAT, FILTER, INFO and samples columns are parsed. :vcf - VCF-style map. FORMAT, FILTER, INFO and samples columns are strings. :bcf - BCF-style map. CHROM, FILTER, INFO and :genotype contains indices to meta-info. :shallow - Only CHROM, POS and ref-length are parsed. :raw - Raw map of ByteBufers. | (defn read-variants ([rdr] (protocols/read-variants rdr)) ([rdr option] (protocols/read-variants rdr option))) |
Reads variants of the VCF/BCF file randomly, returning them as a lazy sequence. | (defn read-variants-randomly ([rdr span-option depth-option] (protocols/read-variants-randomly rdr span-option depth-option))) |
Reads offsets {:file-beg :file-end :beg :end :chr } from VCF/BCF file. | (defn read-file-offsets [rdr] (protocols/read-file-offsets rdr)) |
Writing | |
Returns an open cljam.io.vcf.writer.VCFWriter of f. Meta-information lines and a header line will be written in this function. Should be used inside with-open to ensure the writer is properly closed. e.g. (with-open [wtr (vcf-writer "out.vcf" {:file-date "20090805", :source "myImpu..." ...} ["CHROM" "POS" "ID" "REF" "ALT" ...])] (WRITING-VCF)) | (defn vcf-writer ^VCFWriter [f meta-info' header'] (doto (VCFWriter. (util/as-url f) (cio/writer (util/compressor-output-stream f)) meta-info' header') (vcf-writer/write-meta-info meta-info') (vcf-writer/write-header header'))) |
Returns an open cljam.io.bcf.writer.BCFWriter of f. Meta-information lines and a header line will be written in this function. Should be used inside with-open to ensure the writer is properly closed. e.g. (with-open [wtr (bcf-writer "out.bcf" {:file-date "20090805", :source "myImpu..." ...} ["CHROM" "POS" "ID" "REF" "ALT" ...])] (WRITING-BCF)) | (defn bcf-writer ^BCFWriter [f meta-info' header'] (bcf-writer/writer f meta-info' header')) |
Selects suitable writer from f's extension, returning the open writer. This function supports VCF and BCF formats. | (defn writer ^Closeable [f meta-info' header'] (case (io-util/file-type f) :vcf (vcf-writer f meta-info' header') :bcf (bcf-writer f meta-info' header') (throw (IllegalArgumentException. "Invalid file type")))) |
Writes variants to the VCF/BCF file. wtr must implement cljam.io.protocols/IVariantWriter. variants must be a sequence of parsed or VCF-style maps. e.g. (write-variants [{:chr "19", :pos 111, :id nil, :ref "A", :alt ["C"], :qual 9.6, :filter [:PASS], :info {:DP 4}, :FORMAT [:GT :HQ] ...} ...]) | (defn write-variants [wtr variants] (protocols/write-variants wtr variants)) |