varity.ref-gene

Handles refGene.txt(.gz) and ncbiRefSeq.txt(.gz) content.

cds->genomic-pos

(cds->genomic-pos cds-pos rg)(cds->genomic-pos cds-pos region {:keys [strand cds-start cds-end exon-ranges]})

cds-coord

(cds-coord pos rg)
Converts the genomic position into the coding DNA coordinate. The return
value is clj-hgvs.coordinate/CodingDNACoordinate record.

cds-coord->genomic-pos

(cds-coord->genomic-pos coord {:keys [strand], :as rg})
Converts the coding DNA coordinate into the genomic position. coord must be
clj-hgvs.coordinate/CodingDNACoordinate record.

cds-pos

(cds-pos pos {:keys [strand cds-start cds-end exon-ranges]})

cds-region

(cds-region {:keys [chr cds-start cds-end strand]})
Returns a genomic region of a coding sequence of the given gene. Returns nil
if the gene is a non-coding RNA.

cds-seq

(cds-seq {:keys [cds-start cds-end], :as ref-gene-record})
Returns a lazy sequence of exons included in a coding region of a
`ref-gene-record`. Note that exons outside of the CDS are removed and
partially overlapping ones are cropped in the result. Returns nil if the record
is a non-coding RNA.

exon-ranges->intron-ranges

(exon-ranges->intron-ranges exon-ranges)

exon-seq

(exon-seq {:keys [chr strand exon-ranges]})
Returns a lazy sequence of regions corresponding to each exon in a gene. The
exons are ordered by their index, thus they're reversed in genomic coordinate
if the refGene record is on the reverse strand.

find-ens-g-id

find-ens-t-id

in-any-exon?

(in-any-exon? chr pos rgidx)
Returns true if chr:pos is located in any ref-gene exon, else false.

in-cds?

(in-cds? pos {:keys [cds-start cds-end]})
Returns true if pos is in the coding region, false otherwise.

in-exon?

(in-exon? pos {:keys [exon-ranges]})
Returns true if pos is in the exon region, false otherwise.

index

(index rgs)
Creates refGene index for search.

load-gencode

(load-gencode f parse-line & {:keys [chunk-size], :or {chunk-size 10000}})

load-gff3

(load-gff3 f & {:keys [chunk-size], :or {chunk-size 10000}, :as opts})

load-gtf

(load-gtf f & {:keys [chunk-size], :or {chunk-size 10000}, :as opts})

load-ref-genes

deprecated in 0.8.0

(load-ref-genes f & {:keys [filter-fns], :or {filter-fns [identity]}})
DEPRECATED: Loads f (e.g. refGene.txt(.gz)), returning the all contents as a sequence.

load-ref-seqs

(load-ref-seqs f & {:keys [filter-fns], :or {filter-fns [identity]}})
Loads f (e.g. ncbiRefSeq.txt(.gz)), returning the all contents as a sequence.

max-tx-margin

read-coding-sequence

(read-coding-sequence seq-rdr ref-gene-record)
Reads a coding sequence of a ref-gene record `ref-gene-record` from
`seq-rdr`. Returns nil if the gene is a non-coding RNA.

read-exon-sequence

(read-exon-sequence seq-rdr {:keys [strand], :as exon})
Reads a base sequence of an `exon` from `seq-rdr`.

read-transcript-sequence

(read-transcript-sequence seq-rdr ref-gene-record)
Reads a DNA base sequence of a `ref-gene-record` from `seq-rdr`. The sequence
contains 5'-UTR, CDS and 3'-UTR.

ref-genes

(ref-genes s rgidx)(ref-genes chr pos rgidx)(ref-genes chr pos rgidx tx-margin)
Searches refGene entries with ref-seq, gene or (chr, pos) using index,
returning results as sequence. See also varity.ref-gene/index.

rna-accession?

(rna-accession? s)

seek-gene-region

(seek-gene-region chr pos rgidx)(seek-gene-region chr pos rgidx name)
Seeks chr:pos through exon entries in refGene and returns those indices

tx-region

(tx-region {:keys [chr tx-start tx-end strand]})
Returns a genomic region of the given gene.