Overview


Oncotator is a tool to annotate point mutations and indels with functional data relevant to cancer researchers. Annotations include gene names, functional consequence (e.g. Missense), PolyPhen-2 predictions, and cancer-specific annotations from resources such as COSMIC, Tumorscape, and published MutSig results.


GENCODE Reference Set

To determine mutation consequence, Oncotator utilizes GENCODE (Version 19 - July 2013 freeze, GRCh37 - Ensembl 74) as a reference transcript set. More information about GENCODE can be found here.

For transcript annotations, Oncotator will first give priority to transcripts listed here, and then use the GENCODE BEST_EFFECT approach to decide which transcript is used for transcript annotations.


Latest stable Oncotator version and resources used:

  • Oncotator v1.5.3.0
  • Flat File Reference hg19
  • GENCODE v19 EFFECT
  • UniProt_AAxform 2014_12
  • CCLE_By_GP 09292010
  • UniProt_AA 2014_12
  • dbNSFP v2.4
  • ESP 6500SI-V2
  • dbSNP build 142
  • 1000Genome phase3 20130502
  • ClinVar 12.03.20
  • COSMIC v62_291112
  • ORegAnno UCSC Track
  • ESP 6500SI-V2
  • UniProt 2014_12
  • HumanDNARepairGenes 20110905
  • TUMORScape 20100104
  • CCLE_By_Gene 09292010
  • MutSig Published Results 20110905
  • COSMIC_Tissue 291112
  • CGC full_2012-03-15
  • COSMIC_FusionGenes v62_291112
  • HGNC Sept172014
  • ACHILLES_Lineage_Results 110303
  • TCGAScape 110405
  • Ensembl ICGC MUCOPA
  • Familial_Cancer_Genes 20110905
  • gencode_xref_refseq metadata_v19

Input Format


A tab-delimited file with five required fields. A header line is optional but if supplied, the five headers must be provided as listed below:

  1. 'chr' - Chromsome, ‘chr’ prefix is optional (e.g. ‘chr10’ and ’10’ are both valid).
  2. 'start' - 1-based start coordinate of reference allele.
  3. 'end' - 1-based end coordinate of reference allele.
  4. 'ref_allele' - Positive strand reference allele at positions given above.
  5. 'alt_allele' - Observed alternate allele.

Representing different types of alterations


Single nucleotide variants
chr4 150 150 A T

Insertions

Use ‘-’ in the reference_allele field and start/end coordinates must indicate the two adjacent bases in which the insertion occurs between.

chr4 150 151 - T

Deletions

Use ‘-’ in the observed_allele field to denote deletion of the given reference allele.

chr4 150 150 A -

VCF-style formatting

Indels can also be represented with VCF-style formatting. For example, the insertion and deletion above can be represent as so:

chr4 150 150 A AT
chr4 150 151 AG A

Output Format


Oncotator outputs annotated mutations in tab-delimited Mutation Annotation Format (MAF).

  • First line begins with '##' and provides Oncotator and resource version information.
  • If multiple values exist within a field, a pipe “|” character will be used as a delimiter.
  • Column indices 1-32 are dictated by TCGA MAF specification 2.2. Details for column indices 33-76 are provided in the table below.

Column Header Descriptions

Index MAF Column Header Description of Values Example
35 Genome_Change String describing '+' strand genomic coordinates and alleles. g.chr7:55259515T>G
36 Annotation_Transcript Ensembl transcript ID of transcript used for annotation. ENST00000275493.2
37 Transcript_Strand Strand orientation of the above transcript. +
38 Transcript_Exon Indicates the exon affected by the mutation. 21
39 Transcript_Position Describes absolute start and end coordinates (separated by a underscore characer) with respect to the reference transcript used in the Annotation_Transcript column. Note these coordinates will differ from the coding region coordinates used in the cDNA_Change and Codon_Change columns. Only one number will be provided if the start and end coordinates are the same. 2750
40 cDNA_Change Coding positon and alleles. Coordinates are coding sequence coordinates. c.2573T>G
41 Codon_Change String describing transcript coordinates and alleles in context of codon sequences involved. c.(2572-2574)cTg>cGg
42 Protein_Change Protein postion and alleles involved. p.L858R
43 Other_Transcripts HUGO symbol, Ensembl transcript id, variant classifcation and protein change of other transcripts overlapping with mutation. EGFR_ENST00000442591.1_Intron| EGFR_ENST00000454757.2_Missense_Mutation_p.L805R| EGFR_ENST00000455089.1_Missense_Mutation_p.L813R
44 Refseq_mRNA_Id RefSeq transcript ID. NM_005228.3
45 Refseq_prot_Id Refseq protein ID. NP_005219.2
46 SwissProt_acc_Id UniProt accession ID. P00533
47 SwissProt_entry_Id UniProt entry name ID. EGFR_HUMAN
48 Description If available, description text for transcript. epidermal growth factor receptor
49 UniProt_AApos UniProt protein position used to derive position-specific annotations. This can differ from the protein position listed in the 'Protein_Change' field if the UCSC and Uniprot protein sequeneces differ. 858
50 UniProt_Region Overlapping UniProt regions of interest (e.g. functional domain or repeat region). Cytoplasmic (Potential).| Protein kinase.
51 UniProt_Site Overlapping UniProt single amino acid sites of interest (e.g. cleavage or inhibitory sites for proteases). ATP (By similarity).
52 UniProt_Natural_Variations Overlapping UniProt variants of interest (e.g. polymorphisms or disease-associated mutations). L -> M (found in a lung cancer sample).| L -> R (found in a lung cancer sample; somatic mutation; constitutively activated enzyme with strongly increased kinase activity).
53 UniProt_Experimental_Info Overlapping UniProt sites with experimental data (e.g. mutagenesis data leading to protein activity inhibition). D -> A: Loss of kinase activity.
54 GO_Biological_Process Gene Ontology terms describing pathways and processes UniProt protein is involved in. axon guidance| cell proliferation| cell-cell adhesion
55 GO_Cellular_Component Gene Ontology terms describing localization of given UniProt protein. endoplasmic reticulum membrane| endosome| extracellular space
56 GO_Molecular_Function Gene Ontology terms describing molecular activity of given UniProt protein. ATP binding| MAP/ERK kinase kinase activity/protein heterodimerization activity
57 COSMIC_overlapping_mutations Protein changes of overlapping alterations. Number of samples in COSMIC with said mutation is in parentheses. p.L858R(1489)| p.L858Q(1)| p.L858K(1)
58 COSMIC_fusion_genes Gene symbols of fusion events involving gene in COSMIC. Number of samples in COSMIC with said mutation is in parentheses. PCM1/JAK2(30)| PAX5/JAK2(18)| ETV6/JAK2(11)
59 COSMIC_tissue_types_affected Tissue type summary of tumor samples involving gene in COSMIC. Number of samples in COSMIC is in parentheses. lung(13575)| oesophagus(21)| ovary(38)
60 COSMIC_total_alterations_in_gene Total numbers of records for gene in COSMIC 14110
61 Tumorscape_Amplification_Peaks Overlapping significant GISTIC aplification focal peaks from Tumorscape. (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported. all_cancers(1;1.57e-46)| all_epithelial(1;5.62e-37)| Lung NSC(1;9.29e-25)
62 Tumorscape_Deletion_Peaks Overlapping significant GISTIC deletion focal peaks from Tumorscape. (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported. Lung NSC(174;0.0841)| all_lung(145;0.106)| all_neural(114;0.107)
63 TCGAscape_Amplification_Peaks Overlapping significant GISTIC amplification focal peaks from TCGAscape (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported. GBM - Glioblastoma multiforme(1;0)| all cancers(1;2.19e-314)| LUSC - Lung squamous cell carcinoma(13;0.000168)
64 TCGAscape_Deletion_Peaks Overlapping significant GISTIC deletion focal peaks from TCGAscape. (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported. all cancers(201;9.73e-05)| GBM - Glioblastoma multiforme(135;0.0845)
65 DrugBank Listing of compounds from DrugBank known to interact with genes (DrugBank compound ID in parentheses). Cetuximab(DB00002)| Erlotinib(DB00530)| Gefitinib(DB00317)| Lapatinib(DB01259)| | Panitumumab(DB01269)
66 ref_context Genomic sequence at variant locus with additional 10 bp of flanking sequence on either side. GATTTTGGGCTGGCCAAACTG
67 gc_content Fraction of G or C bases in flanking 100 bp of variant locus 0.537
68 CCLE_ONCOMAP_overlapping_mutations Protein change of overlapping mutations in CCLE Oncomap dataset. Cell line name and lineage are provided in parentheses. L858R(NCIH1975_LUNG)
69 CCLE_ONCOMAP_total_mutations_in_gene Total number of mutations in CCLE Oncomap data for this gene. 31
70 CGC_Mutation_Type Type of mutations reported for this gene in the Cancer Gene Census. A, O, Mis
71 CGC_Translocation_Partner Known translocation partner gene as reported in Cancer Gene Census ALK
72 CGC_Tumor_Types_Somatic Tumor types with somatic alterations in this gene as reported in the Cancer Gene Census. glioma, NSCLC
73 CGC_Tumor_Types_Germline Tumor types with germline alterations in this gene as reported in the Cancer Gene Census. NSCLC
74 CGC_Other_Diseases Other diseases/syndromes with alterations in this gene as reported in Cancer Gene Census. Noonan Syndrome
75 DNARepairGenes_Role Known DNA repair roles for this gene as reported in Wood et al. NER| Involved_in_tolerance_or_repair_of_DNA_crosslinks
76 FamilialCancerDatabase_Syndromes Familial cancer syndromes with alteration in this gene as reported in the Familial Cancer Database. Wiskott-Aldrich_syndrome
77 MUTSIG_Published_Results Published MutSig analyses with gene in signifcant results. Gene rank and q-value are provided in parentheses. TCGA GBM(3;<1E-08)| TSP Lung(4;<1E-08)
78 OREGANNO_ID ID for ORegAnno regulatory regions, transcription factor binding sites, and regulatory polymorphisms as reported in the UCSC Genome Browser. OREG0003809
79 OREGANNO_Values Key-value pairs from ORegAnno UCSC track describing regulatory element. More information can be found here. type=REGULATORY REGION| Gene=GCC1| Dataset=Stanford ENCODE Dataset| EvidenceSubtype=Transient transfection luciferase assay
80 1000Genome_AA 1000 Genomes annotation, Ancestral Allele C
81 1000Genome_AC 1000 Genomes annotation, Alternate Allele Count 2
82 1000Genome_AF 1000 Genomes annotation, Global Allele Frequency based on AC/AN" 0.0009
83 1000Genome_AFR_AF 1000 Genomes annotation, Allele Frequency for samples from AFR based on AC/AN 0.02
84 1000Genome_AMR_AF 1000 Genomes annotation, Allele Frequency for samples from AMR based on AC/AN 0.16
85 1000Genome_AN 1000 Genomes annotation, Total Allele Count 2184
86 1000Genome_ASN_AF 1000 Genomes annotation, Allele Frequency for samples from ASN based on AC/AN 0.0035
87 1000Genome_AVGPOST 1000 Genomes annotation, Average posterior probability from MaCH/Thunder 1
88 1000Genome_CIEND 1000 Genomes annotation, Confidence interval around END for imprecise variants -17,33
89 1000Genome_CIPOS 1000 Genomes annotation, Confidence interval around POS for imprecise variants -18,19
90 1000Genome_END 1000 Genomes annotation, End position of the variant described in this record 5645443
91 1000Genome_ERATE 1000 Genomes annotation, Per-marker Mutation rate from MaCH/Thunder 0.0003
92 1000Genome_EUR_AF 1000 Genomes annotation, Allele Frequency for samples from EUR based on AC/AN 0.23
93 1000Genome_HOMLEN 1000 Genomes annotation, Length of base pair identical micro-homology at event breakpoints 30
94 1000Genome_HOMSEQ 1000 Genomes annotation, Sequence of base pair identical micro-homology at event breakpoints GAGAATCACTTGAACCCG
95 1000Genome_LDAF 1000 Genomes annotation, MLE Allele Frequency Accounting for LD 0.0009
96 1000Genome_RSQ 1000 Genomes annotation, Genotype imputation quality from MaCH/Thunder 1
97 1000Genome_SNPSOURCE 1000 Genomes annotation, indicates if a snp was called when analysing the low coverage or exome alignment data LOWCOV,EXOME
98 1000Genome_SVLEN 1000 Genomes annotation, Difference in length between REF and ALT alleles -16365
99 1000Genome_SVTYPE 1000 Genomes annotation, Type of structural variant DEL
100 1000Genome_THETA 1000 Genomes annotation, Per-marker Transition rate from MaCH/Thunder 0.0007
101 1000Genome_VT 1000 Genomes annotation, indicates what type of variant the line represents SNP
102 ACHILLES_Lineage_Results_Top_Genes Lineages in ACHILLES dataset with gene in top 200 scoring genes. Rank score of gene followed by individual hairpin ranks for given gene are provided in parentheses. Colon(6;3 56 14213 18255)
103 CGC_Cancer Germline Mut Cancer Gene Census annotation, "yes" if variant is in a gene that is mutated in the germline predisposing to cancer. yes
104 CGC_Cancer Molecular Genetics Cancer Gene Census annotation, Indicates whether variants in mutated gene are dominant or recessive. Dom
105 CGC_Cancer Somatic Mut Cancer Gene Census annotation, "yes" if variant is in a gene that is somatically mutated in cancer. yes
106 CGC_Cancer Syndrome Cancer related syndromes with alterations in this gene as reported in Cancer Gene Census. Familial lung cancer
107 CGC_Chr Cancer Gene Census annotation, Chromosome. 7
108 CGC_Chr Band Cancer Gene Census annotation, Chromosome band. 7p12.3-p12.1
109 CGC_GeneID Cancer Gene Census annotation, Entrez gene ID. 1956
110 CGC_Name Cancer Gene Census annotation, Full gene name. epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)
111 CGC_Other Germline Mut Cancer Gene Census annotation, "yes" if variant is in a gene that is germline mutated in other diseases/syndromes. yes
112 CGC_Tissue Type Cancer Gene Census annotation, Tissue types with mutations in this gene. E, O
113 COSMIC_n_overlapping_mutations Total number of COSMIC mutations at variant site. 1491
114 COSMIC_overlapping_mutation_descriptions COSMIC mutation descriptions at variant site. Number of samples in COSMIC is in parentheses. Substitution - Missense(1491)
115 COSMIC_overlapping_primary_sites Primary site summary of tumor samples with COSMIC mutations at variant site. Number of samples in COSMIC is in parentheses. lung(1475)| upper_aerodigestive_tract(5)| thyroid(4)| large_intestine(2)| peritoneum(1)| stomach(1)| thymus(1)| breast(1)| ovary(1)
116 ClinVar_ASSEMBLY ClinVar annotation, Assembly GRCh37
117 ClinVar_HGMD_ID ClinVar annotation, HGNMD ID CM971523
118 ClinVar_SYM ClinVar annotation, Gene symbol TSC1
119 ClinVar_TYPE ClinVar annotation, Type M
120 ClinVar_rs ClinVar annotation, dbSNP ID rs118203682
121 ESP_AA ESP annotation, chimpAllele A
122 ESP_AAC ESP annotation, aminoAcidChange GLN/PRO
123 ESP_AA_AC ESP annotation, African American Allele Count in the order of AltAlleles,RefAllele. For INDELs, A1, A2, or An refers to the N-th alternate allele while R refers to the reference allele. 24,404
124 ESP_AA_AGE ESP annotation, Estimated Variant Age in kilo years for the African American Population 4.2+/-10.8
125 ESP_AA_GTC ESP annotation, African American Genotype Counts in the order of listed GTS 0,2,2201
126 ESP_AvgAAsampleReadDepth ESP annotation, Mean read depth at variant position in African American ESP cohort. 105
127 ESP_AvgEAsampleReadDepth ESP annotation, Mean read depth at variant position in European American ESP cohort. 98
128 ESP_AvgSampleReadDepth ESP annotation, Mean read depth at variant position in all ESP samples. 101
129 ESP_CA ESP annotation, clinicalAssociation http://www.ncbi.nlm.nih.gov/sites/varvu?gene=8890&%3Brs=113994031| http://omim.org/entry/603896
130 ESP_CDP ESP annotation, cDNAPosition 6693
131 ESP_CG ESP annotation, consScoreGERP -2.6
132 ESP_CP ESP annotation, scorePhastCons 0.7
133 ESP_Chromosome ESP annotation, Chromosome 7
134 ESP_DBSNP ESP annotation, dbSNP version which established the rs_id dbSNP_134
135 ESP_DP ESP annotation, Average Sample Read Depth" 351
136 ESP_EA_AC ESP annotation, European American Allele Count in the order of AltAlleles,RefAllele. For INDELs, A1, A2, or An refers to the N-th alternate allele while R refers to the reference allele. 18,599
137 ESP_EA_AGE ESP annotation, Esitmated Variant Age in kilo years for the European American Population 1.2+/-3.3
138 ESP_EA_GTC ESP annotation, European American Genotype Counts in the order of listed GTS 0,1,4299
139 ESP_EXOME_CHIP ESP annotation, Whether a SNP is on the Illumina HumanExome Chip yes
140 ESP_FG ESP annotation, functionGVS coding-synonymous
141 ESP_GL ESP annotation, geneList MUC17
142 ESP_GM ESP annotation, accession NM_001040105.1
143 ESP_GS ESP annotation, granthamScore 76
144 ESP_GTC ESP annotation, Total Genotype Counts in the order of listed GTS 0,2,6501
145 ESP_GTS ESP annotation, Observed Genotypes. For INDELs, A1, A2, or An refers to the N-th alternate allele while R refers to the reference allele. AA,AC,CC
146 ESP_GWAS_PUBMED ESP annotation, PubMed records for GWAS hits http://www.ncbi.nlm.nih.gov/pubmed?term=23104006
147 ESP_MAF ESP annotation, Minor Allele Frequency in percent in the order of EA,AA,All 0.0233,0.0,0.0154
148 ESP_PH ESP annotation, polyPhen benign
149 ESP_PP ESP annotation, proteinPosition" 1124/1178
150 ESP_Position ESP annotation, Genomic position" 55259515
151 ESP_TAC ESP annotation, Total Allele Count in the order of AltAlleles,RefAllele For INDELs, A1, A2, or An refers to the N-th alternate allele while R refers to the reference allele. 213,004
152 ESP_TotalAAsamplesCovered ESP annotation, Total African American samples with read coverage at variant site. 2203
153 ESP_TotalEAsamplesCovered ESP annotation, Total European American samples with read coverage at variant site. 4300
154 ESP_TotalSamplesCovered ESP annotation, Total ESP samples with read coverage at variant site. 6503
155 Ensembl_so_accession Ensembl Sequence ontology accession SO:0001583
156 Ensembl_so_term Ensembl Sequence ontology term missense
157 Familial_Cancer_Genes_Reference Familial cancer database reference used. Familial Cancer Database
158 Familial_Cancer_Genes_Synonym Familial cancer syndrome synonyms with alteration in this gene as reported in the Familial Cancer Database. Hereditary Lung cancer, Hereditary Non-Small Cell Lung cancer
159 HGNC_Ensembl Gene ID HGNC annotation, This column contains a manually curated Ensembl Gene ID. See the HGNC help page site for more information. ENSG00000146648
160 HGNC_HGNC ID HGNC annotation, A unique ID provided by the HGNC. See the HGNC help page site for more information. 3236
161 HGNC_RefSeq IDs HGNC annotation, The Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI. See the HGNC help page site for more information. NM_005228
162 HGNC_Status HGNC annotation, Indicates whether the gene is classified as "Approved", "Entry withdrawn", or "Symbol withdrawn". See the HGNC help page site for more information. Approved
163 HGNC_UCSC ID(supplied by UCSC) HGNC annotation, The UCSC ID is derived from the current build of the UCSC database. See the HGNC help page site for more information. uc003tqk.3
164 HGVS_coding_DNA_change HGVS compliant string describing coding positon and alleles. ENST00000275493.2:c.2573T>G
165 HGVS_genomic_change HGVS compliant string describing '+' strand genomic coordinates and alleles. 7.37:g.55259515T>G
166 HGVS_protein_change HGVS compliant string describing protein postion and alleles involved. ENSP00000275493:p.Leu858Arg
167 ORegAnno_bin UCSC Genome Browser bin for ORegAnno entry. 1555
168 UniProt_alt_uniprot_accessions Alternative UniProt accession IDs O00688| O00732| P06268| Q14225| Q68GS5
169 build User-supplied build value 37
170 ccds_id Consensus CDS project ID CCDS5514.1
171 dbNSFP_1000Gp1_AC dbNSFP annotation, Alternative allele counts in the whole 1000 genomes phase 1 (1000Gp1) data. 8
172 dbNSFP_1000Gp1_AF dbNSFP annotation, Alternative allele frequency in the whole 1000Gp1 data. 0.003663004
173 dbNSFP_1000Gp1_AFR_AC dbNSFP annotation, Alternative allele counts in the 1000Gp1 African descendent samples. 0
174 dbNSFP_1000Gp1_AFR_AF dbNSFP annotation, Alternative allele frequency in the 1000Gp1 African descendent samples. 0
175 dbNSFP_1000Gp1_AMR_AC dbNSFP annotation, Alternative allele counts in the 1000Gp1 American descendent samples. 0
176 dbNSFP_1000Gp1_AMR_AF dbNSFP annotation, Alternative allele frequency in the 1000Gp1 American descendent samples. 0
177 dbNSFP_1000Gp1_ASN_AC dbNSFP annotation, Alternative allele counts in the 1000Gp1 Asian descendent samples. 8
178 dbNSFP_1000Gp1_ASN_AF dbNSFP annotation, Alternative allele frequency in the 1000Gp1 Asian descendent samples. 0.013986014
179 dbNSFP_1000Gp1_EUR_AC dbNSFP annotation, Alternative allele counts in the 1000Gp1 European descendent samples. 0
180 dbNSFP_1000Gp1_EUR_AF dbNSFP annotation, Alternative allele frequency in the 1000Gp1 European descendent samples. 0
181 dbNSFP_Ancestral_allele dbNSFP annotation, Ancestral allele (based on 1000 genomes reference data). The following comes from its original README file" T
182 dbNSFP_CADD_phred dbNSFP annotation, CADD phred-like score. This is phred-like rank score based on whole genome CADD raw scores. Please refer to Kircher et al. (2014) Nature Genetics 46(3) 25.1
183 dbNSFP_CADD_raw dbNSFP annotation, CADD raw score for funtional prediction of a SNP. Please refer to Kircher et al. (2014) Nature Genetics 46(3) 4.601026
184 dbNSFP_CADD_raw_rankscore dbNSFP annotation, CADD raw scores were ranked among all CADD raw scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of CADD raw scores in dbNSFP. Please note the following copyright statement for CADD" 0.87055
185 dbNSFP_ESP6500_AA_AF dbNSFP annotation, Alternative allele frequency in the Afrian American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set). 2.27E-04
186 dbNSFP_ESP6500_EA_AF dbNSFP annotation, Alternative allele frequency in the European American samples of the NHLBI GO Exome Sequencing Project (ESP6500 data set). 1.16E-04
187 dbNSFP_Ensembl_geneid dbNSFP annotation, Ensembl gene id" ENSG00000146648
188 dbNSFP_Ensembl_transcriptid dbNSFP annotation, Ensembl transcript ids (separated by ";") ENST00000455089; ENST00000395504; ENST00000275493; ENST00000454757
189 dbNSFP_FATHMM_pred dbNSFP annotation, If a FATHMMori score is <=-1.5 (or rankscore <=0.81415) the corresponding NS is predicted as "D(AMAGING)"; otherwise it is predicted as "T(OLERATED)". Multiple predictions separated by ";" D;D;D
190 dbNSFP_FATHMM_rankscore dbNSFP annotation, FATHMMori scores were ranked among all FATHMMori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of FATHMMori scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0 to 1. 0.91351
191 dbNSFP_FATHMM_score dbNSFP annotation, FATHMM default score (weighted for human inherited-disease mutations with Disease Ontology) (FATHMMori). Scores range from -18.09 to 11.0. Multiple scores separated by ";" Please refer to Shihab et al. (2013) Human Mutation 34(1)" -2.83;-2.83;-2.83
192 dbNSFP_GERP++_NR dbNSFP annotation, GERP++ neutral rate" 5.71
193 dbNSFP_GERP++_RS dbNSFP annotation, GERP++ RS score, the larger the score, the more conserved the site. 5.71
194 dbNSFP_GERP++_RS_rankscore dbNSFP annotation, GERP++ RS scores were ranked among all GERP++ RS scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of GERP++ RS scores in dbNSFP. 0.89125
195 dbNSFP_Interpro_domain dbNSFP annotation, domain or conserved site on which the variant locates. Domain annotations come from Interpro database. The number in the brackets following a specific domain is the count of times Interpro assigns the variant position to that domain, typically coming from different predicting databases. Multiple entries separated by ";". Serine-threonine/tyrosine-protein kinase (1);Protein kinase-like domain (1);Tyrosine-protein kinase, catalytic domain (1);Protein kinase, catalytic domain (1);
196 dbNSFP_LRT_Omega dbNSFP annotation, estimated nonsynonymous-to-synonymous-rate ratio (Omega, reported by LRT)" 0.137592
197 dbNSFP_LRT_converted_rankscore dbNSFP annotation, LRTori scores were first converted as LRTnew=1-LRTori*0.5 if Omega<1, or LRTnew=LRTori*0.5 if Omega>=1. Then LRTnew scores were ranked among all LRTnew scores in dbNSFP. The rankscore is the ratio of the rank over the total number of the scores in dbNSFP. The scores range from 0.00166 to 0.85682. 0.50627
198 dbNSFP_LRT_pred dbNSFP annotation, LRT prediction, D(eleterious), N(eutral) or U(nknown), which is not solely determined by the score. " D
199 dbNSFP_LRT_score dbNSFP annotation, The original LRT two-sided p-value (LRTori), ranges from 0 to 1. 0.000117
200 dbNSFP_LR_pred dbNSFP annotation, Prediction of our LR based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0.5. The rankscore cutoff between "D" and "T" is 0.82268. D
201 dbNSFP_LR_rankscore dbNSFP annotation, LR scores were ranked among all LR scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of LR scores in dbNSFP. The scores range from 0 to 1. 0.9634
202 dbNSFP_LR_score dbNSFP annotation, Our logistic regression (LR) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from 0 to 1. 0.8806
203 dbNSFP_MutationAssessor_pred dbNSFP annotation, MutationAssessor's functional impact of a variant " M
204 dbNSFP_MutationAssessor_rankscore dbNSFP annotation, MAori scores were ranked among all MAori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MAori scores in dbNSFP. The scores range from 0 to 1. 0.92555
205 dbNSFP_MutationAssessor_score dbNSFP annotation, MutationAssessor functional impact combined score (MAori). The score ranges from -5.545 to 5.975 in dbNSFP. Please refer to Reva et al. (2011) Nucl. Acids Res. 39(17)" 3.32
206 dbNSFP_MutationTaster_converted_rankscore dbNSFP annotation, The MTori scores were first converted" 0.58432
207 dbNSFP_MutationTaster_pred dbNSFP annotation, MutationTaster prediction, "A" ("disease_causing_automatic"), "D" ("disease_causing"), "N" ("polymorphism") or "P" ("polymorphism_automatic"). The score cutoff between "D" and "N" is 0.5 for MTori and 0.328 for the rankscore. D
208 dbNSFP_MutationTaster_score dbNSFP annotation, MutationTaster p-value (MTori), ranges from 0 to 1. 0.999999
209 dbNSFP_Polyphen2_HDIV_pred dbNSFP annotation, Polyphen2 prediction based on HumDiv, "D" ("porobably damaging", HDIV score in [0.957,1] or rankscore in [0.52996,0.89917]), "P" ("possibly damaging", HDIV score in [0.453,0.956] or rankscore in [0.34412,0.52842]) and "B" ("benign", HDIV score in [0,0.452] or rankscore in [0.02656,0.34399]). Score cutoff for binary classification is 0.5 for HDIV score or 0.35411 for rankscore, i.e. the prediction is "neutral" if the HDIV score is smaller than 0.5 (rankscore is smaller than 0.35411), and "deleterious" if the HDIV score is larger than 0.5 (rankscore is larger than 0.35411). Multiple entries are separated by ";". D;D
210 dbNSFP_Polyphen2_HDIV_rankscore dbNSFP annotation, Polyphen2 HDIV scores were first ranked among all HDIV scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.02656 to 0.89917. 0.89917
211 dbNSFP_Polyphen2_HDIV_score dbNSFP annotation, Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1. Multiple entries separated by ";". 0.999;1.0
212 dbNSFP_Polyphen2_HVAR_pred dbNSFP annotation, Polyphen2 prediction based on HumVar, "D" ("probably damaging", HVAR score in [0.909,1] or rankscore in [0.62955,0.9711]), "P" ("possibly damaging", HVAR in [0.447,0.908] or rankscore in [0.44359,0.62885]) and "B" ("benign", HVAR score in [0,0.446] or rankscore in [0.01281,0.44315]). Score cutoff for binary classification is 0.5 for HVAR score or 0.45998 for rankscore, i.e. the prediction is "neutral" if the HVAR score is smaller than 0.5 (rankscore is smaller than 0.45998), and "deleterious" if the HVAR score is larger than 0.5 (rankscore is larger than 0.45998). Multiple entries are separated by ";". D;D
213 dbNSFP_Polyphen2_HVAR_rankscore dbNSFP annotation, Polyphen2 HVAR scores were first ranked among all HVAR scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.01281 to 0.9711. 0.91635
214 dbNSFP_Polyphen2_HVAR_score dbNSFP annotation, Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1. Multiple entries separated by ";". 0.956;0.999
215 dbNSFP_RadialSVM_pred dbNSFP annotation, Prediction of our SVM based ensemble prediction score,"T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. The rankscore cutoff between "D" and "T" is 0.83357. D
216 dbNSFP_RadialSVM_rankscore dbNSFP annotation, RadialSVM scores were ranked among all RadialSVM scores in dbNSFP. The rankscore is the ratio of the rank of the screo over the total number of RadialSVM scores in dbNSFP. The scores range from 0 to 1. 0.97213
217 dbNSFP_RadialSVM_score dbNSFP annotation, Our support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP. 0.9872
218 dbNSFP_Reliability_index dbNSFP annotation, Number of observed component scores (except the maximum frequency in the 1000 genomes populations) for RadialSVM and LR. Ranges from 1 to 10. As RadialSVM and LR scores are calculated based on imputed data, the less missing component scores, the higher the reliability of the scores and predictions. 10
219 dbNSFP_SIFT_converted_rankscore dbNSFP annotation, SIFTori scores were first converted to SIFTnew=1-SIFTori, then ranked among all SIFTnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFTnew score over the total number of SIFTnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The rankscores range from 0.02654 to 0.87932. 0.87932
220 dbNSFP_SIFT_pred dbNSFP annotation, If SIFTori is smaller than 0.05 (rankscore>0.55) the corresponding NS is predicted as "D(amaging)"; otherwise it is predicted as "T(olerated)". Multiple predictions separated by ";" D
221 dbNSFP_SIFT_score dbNSFP annotation, SIFT score (SIFTori). Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";". 0
222 dbNSFP_SLR_test_statistic dbNSFP annotation, SLR test statistic for testing natural selection on codons. A negative value indicates negative selection, and a positive value indicates positive selection. Larger magnitude of the value suggests stronger evidence. -56.2754
223 dbNSFP_SiPhy_29way_logOdds dbNSFP annotation, SiPhy score based on 29 mammals genomes. The larger the score, the more conserved the site. 14.8112
224 dbNSFP_SiPhy_29way_logOdds_rankscore dbNSFP annotation, SiPhy_29way_logOdds scores were ranked among all SiPhy_29way_logOdds scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of SiPhy_29way_logOdds scores in dbNSFP. 0.69996
225 dbNSFP_SiPhy_29way_pi dbNSFP annotation, The estimated stationary distribution of A, C, G and T at the site, using SiPhy algorithm based on 29 mammals genomes. " 0.0:0.0:0.0:1.0
226 dbNSFP_UniSNP_ids dbNSFP annotation, rs numbers from UniSNP, which is a cleaned version of dbSNP build 129, in format" .
227 dbNSFP_Uniprot_aapos dbNSFP annotation, amino acid position as to Uniprot. Multiple entries separated by ";". 813;858
228 dbNSFP_Uniprot_acc dbNSFP annotation, Uniprot accession number. Multiple entries separated by ";". Q504U8;P00533
229 dbNSFP_Uniprot_id dbNSFP annotation, Uniprot ID number. Multiple entries separated by ";". EGFR_HUMAN
230 dbNSFP_aaalt dbNSFP annotation, alternative amino acid "." if the variant is a splicing site SNP (2bp on each end of an intron) R
231 dbNSFP_aapos dbNSFP annotation, amino acid position as to the protein. "-1" if the variant is a splicing site SNP (2bp on each end of an intron) 813;728;858;805
232 dbNSFP_aapos_FATHMM dbNSFP annotation, ENSP id and amino acid positions corresponding to FATHMM scores. Multiple entries separated by ";" ENSP00000415559:L813R; ENSP00000275493:L858R; ENSP00000395243:L805R
233 dbNSFP_aapos_SIFT dbNSFP annotation, ENSP id and amino acid positions corresponding to SIFT scores. Multiple entries separated by ";" ENSP00000275493:L858R
234 dbNSFP_aaref dbNSFP annotation, reference amino acid. "." if the variant is a splicing site SNP (2bp on each end of an intron) L
235 dbNSFP_cds_strand dbNSFP annotation, coding sequence (CDS) strand (+ or -) +
236 dbNSFP_codonpos dbNSFP annotation, position on the codon (1, 2 or 3) 2
237 dbNSFP_fold-degenerate dbNSFP annotation, degenerate type (0, 2 or 3) 0
238 dbNSFP_genename dbNSFP annotation, gene name; if the NScan be assigned to multiple genes, gene names are separated by ";" EGFR
239 dbNSFP_hg18_pos(1-coor) dbNSFP annotation, physical position on the chromosome as to hg18 (1-based coordinate) 55227009
240 dbNSFP_phastCons100way_vertebrate dbNSFP annotation, phastCons conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site. 1
241 dbNSFP_phastCons100way_vertebrate_rankscore dbNSFP annotation, phastCons100way_vertebrate scores were ranked among all phastCons100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons100way_vertebrate scores in dbNSFP. 0.71417
242 dbNSFP_phastCons46way_placental dbNSFP annotation, phastCons conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site. 1
243 dbNSFP_phastCons46way_placental_rankscore dbNSFP annotation, phastCons46way_placental scores were ranked among all phastCons46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_placental scores in dbNSFP. 0.80357
244 dbNSFP_phastCons46way_primate dbNSFP annotation, phastCons conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site. 0.962
245 dbNSFP_phastCons46way_primate_rankscore dbNSFP annotation, phastCons46way_primate scores were ranked among all phastCons46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_primate scores in dbNSFP. 0.63368
246 dbNSFP_phyloP100way_vertebrate dbNSFP annotation, phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site. 7.89
247 dbNSFP_phyloP100way_vertebrate_rankscore dbNSFP annotation, phyloP100way_vertebrate scores were ranked among all phyloP100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP100way_vertebrate scores in dbNSFP. 0.87313
248 dbNSFP_phyloP46way_placental dbNSFP annotation, phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site. 2.176
249 dbNSFP_phyloP46way_placental_rankscore dbNSFP annotation, phyloP46way_placental scores were ranked among all phyloP46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_placental scores in dbNSFP. 0.68965
250 dbNSFP_phyloP46way_primate dbNSFP annotation, phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site. 0.528
251 dbNSFP_phyloP46way_primate_rankscore dbNSFP annotation, phyloP46way_primate scores were ranked among all phyloP46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_primate scores in dbNSFP. 0.53228
252 dbNSFP_refcodon dbNSFP annotation, reference codon CTG
253 gencode_transcript_name Gencode transcript name EGFR-001
254 gencode_transcript_status Gencode transcript status KNOWN
255 gencode_transcript_tags Gencode transcript tags basic| appris_principal| CCDS
256 gencode_transcript_type Gencode transcript type protein_coding
257 gene_id Internal gene ID 0
258 gene_type Type of gene used for variant annotation protein_coding
259 havana_transcript HAVANA transcript ID OTTHUMT00000251456.2
260 secondary_variant_classification Oncotator secondary variant classification Intron
261 strand Strand orientation of variant genomic coordinates +
262 transcript_id Transcript ID used for variant annotation ENST00000275493.2

Web service API


A REST-like interface is available for obtaining detailed annotataions in JSON format for genes, transcripts, and mutations.


Example API Queries


specific gene annotations
http://www.broadinstitute.org/oncotator/gene/EGFR/

gene annotations across a given genomic range (hg19 coordinates)

Provide "chr", "start", and "end" parameters delimited by an underscore character ("_").

http://www.broadinstitute.org/oncotator/genes/chr4_50164411_60164411/

specific transcript annotations
http://www.broadinstitute.org/oncotator/transcript/ENST00000257290.5/

transcript annotations across a given genomic range (hg19 coordinates)

Provide "chr", "start", and "end" parameters delimited by an underscore character ("_").

http://www.broadinstitute.org/oncotator/transcripts/chr4_50164411_60164411/

specific mutation annotations

Provide "chr", "start", "end", "reference_allele", and "observed_allele" parameters delimited by an underscore character ("_").

http://www.broadinstitute.org/oncotator/mutation/7_55259515_55259515_T_G/