Functions to normalize the SAM/BAM format.

(ns cljam.algo.normal
  (:require [cljam.io.sam :as sam]
            [cljam.io.util :as io-util]
            [cljam.util.chromosome :refer [normalize-chromosome-key]]))
(def ^:private chunk-size 1500000)
(defn- normalize-header
  [hdr]
  (update hdr :SQ (fn [xs]
                    (mapv #(update % :SN normalize-chromosome-key) xs))))

TODO: copy all rest of stream for performance. (do not read, parse and write)

(defn- transfer-blocks
  [rdr wtr]
  (doseq [blks (partition-all chunk-size (sam/read-blocks rdr))]
    (sam/write-blocks wtr blks)))
(defn- transfer-alignments
  [rdr wtr hdr]
  (doseq [alns (->> (sam/read-alignments rdr)
                    (map #(update % :rname normalize-chromosome-key))
                    (partition-all chunk-size))]
    (sam/write-alignments wtr alns hdr)))

Normalizes references of the SAM/BAM format. Be noted that performance may be degraded if either or both of rdr and wtr is one about the SAM format.

(defn normalize
  [rdr wtr]
  (let [hdr (normalize-header (sam/read-header rdr))]
    (sam/write-header wtr hdr)
    (sam/write-refs wtr hdr)
    (if (and (io-util/bam-reader? rdr) (io-util/bam-writer? wtr))
      (transfer-blocks rdr wtr)
      (transfer-alignments rdr wtr hdr))))