Allows for reading in RefSeq information

Category ROD Codecs


Parses a sorted UCSC RefSeq file (see below) into relevant features: the gene name, the unique gene name (if multiple transcrips get separate entries), exons, gene start/stop, coding start/stop, strandedness of transcription.

Instructions for generating a RefSeq file for use with the RefSeq codec can be found on the documentation guide here


The RefSeq Rod can be bound as any other rod, and is specified by REFSEQ, for example
 -refSeqBinding:REFSEQ /path/to/refSeq.txt
You will need to consult individual walkers for the binding name ("refSeqBinding", above)

File format example

If you want to define your own file for use, the format is (tab delimited): bin, name, chrom, strand, transcription start, transcription end, coding start, coding end, num exons, exon starts, exon ends, id, alt. name, coding start status (complete/incomplete), coding end status (complete,incomplete) and exon frames, for example:
 76 NM_001011874 1 - 3204562 3661579 3206102 3661429 3 3204562,3411782,3660632, 3207049,3411982,3661579, 0 Xkr4 cmpl cmpl 1,2,0,
for more information see here

Return to top

See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.3-0-g37228af built at 2014/10/24 14:40:51. GTD: NA