Fork me on GitHub

A DNA Sequence Alignment/Map (SAM) library for Clojure

Next generation sequencing (NGS) can determine DNA bases and the result of alignment files is generally stored in the Sequence Alignment/Map (SAM) format and the compressed binary version file format (BAM). cljam is a Clojure library for easily manipulating SAM/BAM format files. These files are huge in size from several GB to several hundred GB.

In the viewpoint of accumulation of huge NGS data, a simple parallelization program that can support cloud and PC cluster environment are required. Clojure is a lightweight programming language favorable for large data analysis with parallel processing. It provides convenient list processing functions and immutable data structures. cljam can deal with SAM/BAM formats simply and fast with such features.

With parallel processing, the performance of cljam is the almost same to similar SAM/BAM manipulating tools while it runs on JVM. Also, the codebase of cljam is much shorter and simpler than them.