Genomic Annotator Data Tables
From GSA
Tables are available
internally on a shared drive:
/humgen/gsa-hpprojects/GATK/data/Annotations
externally via FTP:
ftp://gatk-ftp:PH5UH7Pa@ftp.broadinstitute.org
The refGene track has been created and is ready for use with the GenomicAnnotator.
RefGene
"The RefSeq Genes track shows known human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq)." (Source : UCSC RefGene Table Description)
Data Files:
For use with the NCBI b36 reference:
/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-b36.txt
For use with the UCSC hg18 reference:
/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-hg18.txt
For use with the NCBI b37 reference:
/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-b37.txt
For use with the UCSC hg19 reference:
/humgen/gsa-hpprojects/GATK/data/Annotations/refseq/refGene-big-table-hg19.txt
WARNING: The following data was excluded from the RefSeq table:
- RefSeq transcripts where CDS sequence length isn't a multiple of 3
- Genomic positions where the reference base wasn't one of: A,T,C, or G (for example: 'N')
How these files were created: Genomic Annotator - refGene Table - How It Was Created
Column Descriptions
| Name | Example | Description |
| chr | chr3 | Genomic chromosome/contig |
| start | 12345 | Genomic position (1-based inclusive) |
| end | 12349 | Genomic position (1-based inclusive) |
| haplotypeReference | G | Special column. GenomicAnnotator will match it against the reference base of the variant being annotated. If it doesn't match, the variant won't be annotated with this refGene record. If haplotypeReference value is *, it matches any reference base. Note haplotypeReference is always on the + strand of the genomic reference (like haplotypeAlternate) |
| haplotypeAlternate | A | Special column. GenomicAnnotator will match it against the alternate allele of the variant being annotated. If it doesn't match, the variant won't be annotated with this refGene record. If haplotypeAlternate value is *, it matches any alternate allele. Note haplotypeAlternate is always on the + strand of the genomic reference (like haplotypeReference) |
| name | NM_021649 | Name of gene (usually transcript_id from GTF) |
| name2 | TICAM2 | Alternate gene name (e.g. gene_id from GTF) |
| transcriptStrand | + | Strand of the transcript. |
| positionType | utr5, CDS, utr3, intron, non_coding_exon, non_coding_intron | Describes the position. |
| frame | 0 | The reading frame - can be 0,1, or 2. This field is only populated for positions within CDS. |
| mrnaCoord | 125 | Counts bases from 5' to 3' across the transcript, starting with 1. Only bases within exons are counted. This field is left blank for non-coding transcripts, or for bases that are within introns and further than 10bp from a splice junction. |
| codonCoord | 78 | Counts amino acids from 5' to 3'. |
| spliceDist | -53 | Counts bases to the nearest splice junction. Is negative for positions up-stream of the splice junction, and positive down-stream. This field is left blank for bases that are within introns and further than 10bp from a splice junction (to reduce refgene table disk usage). |
| referenceCodon | GAA | The reference codon at this position, based on the current reading frame. |
| referenceAA | Glu,Thr,Val,Met,Stop,etc. | 3-letter name of the amino acid coded by the reference codon. |
| variantCodon | TTC | The variant codon at this position, based on the current reading frame and the haplotypeAlternate base. |
| variantAA | Phe,Thr,Val,Met,Stop,etc. | 3-letter name of the amino acid coded by the variant codon. |
| changesAA | false | Whether the haplotypeAlternate base changes the amino acid. |
| functionalClass | missense, nonsense, silent | Classifies the mutation. |
| codingCoordStr | c.10G>T | Coding coordinate string in publication format. This field is only populated for positions within the 5' UTR, CDS, and 3' UTR. |
| proteinCoordStr | p.E4* | Protein coordinate string in publication format. |
| inCodingRegion | true | Whether this base is within CDS. |
| spliceInfo | splice-acceptor 8 | Whether this position is a splice acceptor or a splice donor, followed by distance from the splice junction. This field is only populated for positions within 10bp of a splice junction. |
| uorfChange | +1, -1 | +1 means this variant creates a new ATG codon within the 5' UTR. -1 means it disrupts an existing ATG codon. |