Functions to read and write formats representing sequences such as FASTA and TwoBit.

(ns cljam.io.sequence
  (:refer-clojure :exclude [indexed?])
  (:require [cljam.io.fasta.core :as fa-core]
            [cljam.io.fasta.writer :as fa-writer]
            [cljam.io.protocols :as protocols]
            [cljam.io.twobit.reader :as tb-reader]
            [cljam.io.twobit.writer :as tb-writer]
            [cljam.io.util :as io-util])
  (:import java.io.Closeable
           cljam.io.fasta.reader.FASTAReader
           cljam.io.fasta.writer.FASTAWriter
           cljam.io.twobit.reader.TwoBitReader
           cljam.io.twobit.writer.TwoBitWriter))

Reading

Returns an open cljam.io.fasta.reader.FASTAReader of f. Should be used inside with-open to ensure the reader is properly closed.

(defn fasta-reader
  ^FASTAReader
  [f]
  (fa-core/reader f))

Returns an open cljam.io.twobit.reader.TwoBitReader of f. Should be used inside with-open to ensure the reader is properly closed.

(defn twobit-reader
  ^TwoBitReader
  [f]
  (tb-reader/reader f))

Selects suitable reader from f's extension, returning the open reader. Opens a new reader if the arg represents a file such as String path, java.io.File, or java.net.URL. If a reader is given, clones the reader. This function supports FASTA and TwoBit formats.

(defn reader
  ^Closeable
  [f]
  (cond
    (io-util/fasta-reader? f) (fa-core/clone-reader f)
    (io-util/twobit-reader? f) (tb-reader/clone-reader f)
    :else (case (try
                  (io-util/file-type f)
                  (catch IllegalArgumentException _
                    (io-util/file-type-from-contents f)))
            :fasta (fasta-reader f)
            :2bit (twobit-reader f)
            (throw (IllegalArgumentException. "Invalid file type")))))

Reads sequence in region of FASTA/TwoBit file.

(defn read-sequence
  ([rdr region] (protocols/read-sequence rdr region))
  ([rdr region option] (protocols/read-sequence rdr region option)))

Reads all sequences of FASTA/TwoBit file.

(defn read-all-sequences
  ([rdr] (protocols/read-all-sequences rdr))
  ([rdr option] (protocols/read-all-sequences rdr option)))

Returns summaries of sequences in FASTA/TwoBit file. Returns a vector of maps containing :name and :len.

(defn read-seq-summaries
  [rdr]
  (protocols/read-seq-summaries rdr))

Reads metadata of indexed sequences. Returns a vector of maps containing :name, :len and other format-specific keys. Forces loading all indices.

(defn read-indices
  [rdr]
  (protocols/read-indices rdr))

Returns true if the reader can be randomly accessed, false if not. Note this function immediately realizes a delayed index.

(defn indexed?
  [rdr]
  (protocols/indexed? rdr))

Writing

Returns an open cljam.io.fasta.writer.FASTAWriter of f with options: :cols - Maximum number of characters written in one row. :create-index? - If true, .fai will be created simultaneously. Should be used inside with-open to ensure the writer is properly closed.

(defn fasta-writer
  (^FASTAWriter [f]
   (fasta-writer f {}))
  (^FASTAWriter [f options]
   (fa-writer/writer f options)))

Returns an open cljam.io.twobit.writer.TwoBitWriter of f with options: :index - metadata of indexed sequences. The amount of memory usage can be reduced if index is supplied. Should be used inside with-open to ensure the writer is properly closed.

(defn twobit-writer
  (^TwoBitWriter [f]
   (twobit-writer f {}))
  (^TwoBitWriter [f options]
   (tb-writer/writer f options)))

Selects suitable writer from f's extension, returning the open writer. This function supports FASTA and TwoBit format.

(defn writer
  ^Closeable
  [f & options]
  (case (io-util/file-type f)
    :fasta (apply fasta-writer f options)
    :2bit (apply twobit-writer f options)
    (throw (IllegalArgumentException. "Invalid file type"))))

Writes all sequences to FASTA/TwoBit file.

(defn write-sequences
  [wtr seqs]
  (protocols/write-sequences wtr seqs))