Functions to normalize the SAM/BAM format. | (ns cljam.algo.normal (:require [cljam.io.sam :as sam] [cljam.io.util :as io-util] [cljam.util.chromosome :refer [normalize-chromosome-key]])) |
(def ^:private chunk-size 1500000) | |
(defn- normalize-header [hdr] (update hdr :SQ (fn [xs] (mapv #(update % :SN normalize-chromosome-key) xs)))) | |
TODO: copy all rest of stream for performance. (do not read, parse and write) | (defn- transfer-blocks [rdr wtr] (doseq [blks (partition-all chunk-size (sam/read-blocks rdr))] (sam/write-blocks wtr blks))) |
(defn- transfer-alignments [rdr wtr hdr] (doseq [alns (->> (sam/read-alignments rdr) (map #(update % :rname normalize-chromosome-key)) (partition-all chunk-size))] (sam/write-alignments wtr alns hdr))) | |
Normalizes references of the SAM/BAM format. Be noted that performance may be degraded if either or both of rdr and wtr is one about the SAM format. | (defn normalize [rdr wtr] (let [hdr (normalize-header (sam/read-header rdr))] (sam/write-header wtr hdr) (sam/write-refs wtr hdr) (if (and (io-util/bam-reader? rdr) (io-util/bam-writer? wtr)) (transfer-blocks rdr wtr) (transfer-alignments rdr wtr hdr)))) |