Genomic Annotator Data Tables

From GSA
Jump to: navigation, search

Tables are available

internally on a shared drive:

/humgen/gsa-hpprojects/GATK/data/Annotations

externally via FTP:

ftp://gatk-ftp:PH5UH7Pa@ftp.broadinstitute.org

The refGene track has been created and is ready for use with the GenomicAnnotator.

RefGene

"The RefSeq Genes track shows known human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq)." (Source : UCSC RefGene Table Description)


Data Files:

For use with the NCBI b36 reference:

/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-b36.txt

For use with the UCSC hg18 reference:

/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-hg18.txt

For use with the NCBI b37 reference:

/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-b37.txt

For use with the UCSC hg19 reference:

/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-hg19.txt


WARNING: The following data was excluded from the RefSeq table:

  • RefSeq transcripts where CDS sequence length isn't a multiple of 3
  • Genomic positions where the reference base wasn't one of: A,T,C, or G (for example: 'N')


How these files were created: Genomic Annotator - refGene Table - How It Was Created


Column Descriptions

Name Example Description
chr chr3 Genomic chromosome/contig
start 12345 Genomic position (1-based inclusive)
end 12349 Genomic position (1-based inclusive)
haplotypeReference G Special column. GenomicAnnotator will match it against the reference base of the variant being annotated. If it doesn't match, the variant won't be annotated with this refGene record. If haplotypeReference value is *, it matches any reference base. Note haplotypeReference is always on the + strand of the genomic reference (like haplotypeAlternate)
haplotypeAlternate A Special column. GenomicAnnotator will match it against the alternate allele of the variant being annotated. If it doesn't match, the variant won't be annotated with this refGene record. If haplotypeAlternate value is *, it matches any alternate allele. Note haplotypeAlternate is always on the + strand of the genomic reference (like haplotypeReference)
name NM_021649 Name of gene (usually transcript_id from GTF)
name2 TICAM2 Alternate gene name (e.g. gene_id from GTF)
transcriptStrand + Strand of the transcript.
positionType utr5, CDS, utr3, intron, non_coding_exon, non_coding_intron Describes the position.
frame 0 The reading frame - can be 0,1, or 2. This field is only populated for positions within CDS.
mrnaCoord 125 Counts bases from 5' to 3' across the transcript, starting with 1. Only bases within exons are counted. This field is left blank for non-coding transcripts, or for bases that are within introns and further than 10bp from a splice junction.
codonCoord 78 Counts amino acids from 5' to 3'.
spliceDist -53 Counts bases to the nearest splice junction. Is negative for positions up-stream of the splice junction, and positive down-stream. This field is left blank for bases that are within introns and further than 10bp from a splice junction (to reduce refgene table disk usage).
referenceCodon GAA The reference codon at this position, based on the current reading frame.
referenceAA Glu,Thr,Val,Met,Stop,etc. 3-letter name of the amino acid coded by the reference codon.
variantCodon TTC The variant codon at this position, based on the current reading frame and the haplotypeAlternate base.
variantAA Phe,Thr,Val,Met,Stop,etc. 3-letter name of the amino acid coded by the variant codon.
changesAA false Whether the haplotypeAlternate base changes the amino acid.
functionalClass missense, nonsense, silent Classifies the mutation.
codingCoordStr c.10G>T Coding coordinate string in publication format. This field is only populated for positions within the 5' UTR, CDS, and 3' UTR.
proteinCoordStr p.E4* Protein coordinate string in publication format.
inCodingRegion true Whether this base is within CDS.
spliceInfo splice-acceptor 8 Whether this position is a splice acceptor or a splice donor, followed by distance from the splice junction. This field is only populated for positions within 10bp of a splice junction.
uorfChange +1, -1 +1 means this variant creates a new ATG codon within the 5' UTR. -1 means it disrupts an existing ATG codon.
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox