Documentation

RefSeqCodec

Allows for reading in RefSeq information

Category ROD Codecs


Overview

Parses a sorted UCSC RefSeq file (see below) into relevant features: the gene name, the unique gene name (if multiple transcrips get separate entries), exons, gene start/stop, coding start/stop, strandedness of transcription.

Instructions for generating a RefSeq file for use with the RefSeq codec can be found on the documentation guide here http://www.broadinstitute.org/gatk/guide/article?id=1329

Usage

The RefSeq Rod can be bound as any other rod, and is specified by REFSEQ, for example
 -refSeqBinding:REFSEQ /path/to/refSeq.txt
 
You will need to consult individual walkers for the binding name ("refSeqBinding", above)

File format example

If you want to define your own file for use, the format is (tab delimited): bin, name, chrom, strand, transcription start, transcription end, coding start, coding end, num exons, exon starts, exon ends, id, alt. name, coding start status (complete/incomplete), coding end status (complete,incomplete) and exon frames, for example:
 76 NM_001011874 1 - 3204562 3661579 3206102 3661429 3 3204562,3411782,3660632, 3207049,3411982,3661579, 0 Xkr4 cmpl cmpl 1,2,0,
 
for more information see here


See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.1-1-g07a4bf8 built at 2014/03/18 07:00:36. GTD: NA