Oncotator is a tool to annotate point mutations and indels with functional data relevant to cancer researchers. Annotations include gene names, functional consequence (e.g. Missense), PolyPhen-2 predictions, and cancer-specific annotations from resources such as COSMIC, Tumorscape, and published MutSig results.
To determine mutation consequence, Oncotator utilizes a set of 73,671 reference transcripts derived from transcripts from the UCSC Genome Browser’s UCSC Genes track and microRNAs from miRBase release 153 as provided in the TCGA General Annotation Files (GAF) library. More information about the GAF can be found here.
- Oncotator v0.4.1.8
- GAF 2.1 Jun2011 reference transcripts
- dbSNP build 134
- UniProt Release 2011_9
- Cosmic v55
- Tumorscape Release 1.6
- TCGA Copy Number Portal Release 1.0
- MutSig published results 20110905
- CCLE Oncomap data
- Cancer Gene Census 20110322
- Familial Cancer Database 20110905
- Human DNA Repair Genes (Wood et al.)
- PolyPhen-2 WHPSS
A tab-delimited file with five required fields. A header line is optional but if supplied, the five headers must be provided as listed below.
The required fields are:
- 'chr' - Chromsome, ‘chr’ prefix is optional (e.g. ‘chr10’ and ’10’ are both valid).
- 'start' - 1-based start coordinate of reference allele.
- 'end' - 1-based end coordinate of reference allele.
- 'reference_allele' - Positive strand reference allele at positions given above.
- 'observed_allele' - Observed allele.
chr4 150 150 A T
Use ‘-’ in the reference_allele field and start/end coordinates must indicate the two adjacent bases in which the insertion occurs between.
chr4 150 151 - T
Use ‘-’ in the observed_allele field to denote deletion of the given reference allele.
chr4 150 150 A -
Indels can also be represented with VCF-style formatting. For example, the insertion and deletion above can be represent as so:
chr4 150 150 A AT
chr4 150 151 AG A
Oncotator outputs annotated mutations in tab-delimited Mutation Annotation Format (MAF).
- First line begins with '##' and provides Oncotator and resource version information.
- If multiple values exist within a field, a pipe “|” character will be used as a delimiter.
- Column indices 1-32 are dictated by TCGA MAF specification 2.2. Details for column indices 33-76 are provided in the table below.
|Index||MAF Column Header||Description of Values||Example|
|33||Genome_Change||String describing '+' strand genomic coordinates and alleles.||g.chr7:55227009T>G|
|34||Annotation_Transcript||UCSC transcript ID of transcript used for annotation.||uc003tqk.1|
|35||Transcript_Strand||Strand orientation of the above transcript.||+|
|36||Transcript_Exon||Indicates the exon number of reference transcript that the mutation affects. Indicates the exon affected by the mutation.||9|
|37||Transcript_Position||Describes absolute start and end coordinates (separated by a underscore characer) with respect to reference transcript used in the “Annotation_Transcript” column. Note these coordinates will differ from the coding region coordinates used in the “cDNA_Change” and “Codon_Change” columns. Only one number will be provided if the start and end coordinates are the same.||2099_2100|
|38||cDNA_Change||Coding positon and alleles. Coordinates are coding sequence coordinates.||c.2573T>G|
|39||Codon_Change||String describing transcript coordinates and alleles in context of codon sequences involved.||c.(2572-2574)CTG>CGG|
|40||Protein_Change||Protein postion and alleles involved.||p.L858R|
|41||Other_Transcripts||HUGO symbol, UCSC transcript id, variant classifcation and protein change of other transcripts overlapping with mutation. Use the "ALL" transcripts output option to see detailed annotations for each of the transcripts in this field.||EGFR_uc010kzg.1_Missense_Mutation_p.L813R|
|42||Refseq_mRNA_Id||RefSeq transcript ID.||NM_005228|
|43||Refseq_prot_Id||Refseq protein ID.||NP_005219|
|44||SwissProt_acc_Id||UniProt accession ID.||P00533|
|45||SwissProt_entry_Id||UniProt entry name ID.||EGFR_HUMAN|
|46||Description||If available, description text for transcript.||epidermal growth factor receptor isoform a|
|47||UniProt_AApos||UniProt protein position used to derive position-specific annotations. This can differ from the protein position listed in the 'Protein_Change' field if the UCSC and Uniprot protein sequeneces differ.||858|
|48||UniProt_Region||Overlapping UniProt regions of interest (e.g. functional domain or repeat region).||Cytoplasmic (Potential).|Protein kinase.|
|49||UniProt_Site||Overlapping UniProt single amino acid sites of interest (e.g. cleavage or inhibitory sites for proteases).||ATP (By similarity).|
|50||UniProt_Natural_Variants||Overlappng UniProt natural variants (e.g. disease-associated mutations or RNA editing events).||S -> C (in Beare-Stevenson cutis gyrata syndrome).|
|51||UniProt_Experimental_Info||Overlapping UniProt sites with experimental data (e.g. mutagenesis data leading to protein activity inhibition).||D->A: Loss of kinase activity.|
|52||GO_Biological_Process||Gene Ontology terms describing pathways and processes UniProt protein is involved in.||anoikis|cell cycle arrest|energy reserve metabolic process|
|53||GO_Cellular_Component||Gene Ontology terms describing localization of given UniProt protein.||cytosol|nucleus|
|54||GO_Molecular_Function||Gene Ontology terms describing molecular activity of given UniProt protein.||ATP binding|magnesium ion binding|protein serine/threonine kinase activity|
|55||COSMIC_overlapping_mutations||Protein changes of overlapping alterations. Number of samples in COSMIC with said mutation is in parentheses.||p.V617F(27905)|p.V617_C618>FR(2)|p.V617I(1)|
|56||COSMIC_fusion_genes||Gene symbols of fusion events involving gene in COSMIC. Number of samples in COSMIC with said mutation is in parentheses.||PCM1/JAK2(30)|PAX5/JAK2(18)|ETV6/JAK2(11)|
|57||COSMIC_tissue_types_affected||Tissue type summary of tumor samples involving gene in COSMIC. Number of samples in COSMIC is in parentheses.||haematopoietic_and_lymphoid_tissue(28274)|lung(5)|breast(4)|
|58||COSMIC_total_alterations_in_gene||Total numbers of records for gene in COSMIC||28285|
|59||Tumorscape_Amplification_Peaks||Overlapping significant GISTIC aplification focal peaks from Tumorscape. (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported.||all_cancers(1;1.57e-46)|all_epithelial(1;5.62e-37)|Lung NSC(1;9.29e-25)|
|60||Tumorscape_Deletion_Peaks||Overlapping significant GISTIC deletion focal peaks from Tumorscape. (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported.||Lung NSC(174;0.0841)|all_lung(145;0.106)|all_neural(114;0.107)|
|61||TCGAscape_Amplification_Peaks||Overlapping significant GISTIC amplification focal peaks from TCGAscape (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported.||GBM - Glioblastoma multiforme(1;0)|all cancers(1;2.19e-314)|LUSC - Lung squamous cell carcinoma(13;0.000168)|
|62||TCGAscape_Deletion_Peaks||Overlapping significant GISTIC deletion focal peaks from TCGAscape. (Number of genes in peak and q-value of peaks is in parentheses). Only peak regions with a q-value <= 0.20 are reported.||all cancers(201;9.73e-05)|GBM - Glioblastoma multiforme(135;0.0845)|
|63||DrugBank||Listing of compounds from DrugBank known to interact with genes (DrugBank compound ID in parentheses).||Sunitinib(DB01268)|
|64||PPH2_Class||Polyphen-2 probabilistic binary classifier outcome ('deleterious' or 'neutral').||deleterious|
|65||PPH2_Prob||Polyphen-2 classifier probability of the variation being damaging.||0.926|
|66||PPH2_FDR||Polyphen-2 classifier model False Discovery Rate at the above probability.||0.171|
|67||PPH2_MSA_dScore||Polyphen-2 difference of multiple sequence alignment PSIC scores for two amino acid residue variants (Score1-Score2).||1.875|
|68||PPH2_MSA_Score1||Polyphen-2 multiple sequence alignment PSIC score for wild type amino acid residue (aa1).||1.523|
|69||PPH2_MSA_Score2||Polyphen-2 multiple sequence alignment PSIC score for mutant amino acid residue (aa2).||-0.352|
|70||PPH2_MSA_Nobs||Polyphen-2 number of residues observed at the substitution position in multiple sequence alignment (without gaps).||39|
|71||CCLE_ONCOMAP_overlapping_mutations||Protein change of overlapping mutations in CCLE Oncomap dataset. Cell line name and lineage are provided in parentheses.||R130G(OV56_OVARY)|R130G(KMBC2_URINARY_TRACT)|
|72||CCLE_ONCOMAP_total_mutations_in_gene||Total number of mutations in CCLE Oncomap data for this gene.||31|
|73||CGC_Mutation_Type||Type of mutations reported for this gene in Cancer Gene Census. See abbreviations here.||D, Mis, N, F, S|
|74||CGC_Translocation_Partner||Known translocation partner gene as reported in Cancer Gene Census||ALK|
|75||CGC_Tumor_Types_Somatic||Tumor types with somatic alterations in this gene as reported in Cancer Gene Census. See abbreviations here.||MDS, CML|
|76||CGC_Tumor_Types_Germline||Tumor types with germline alterations in this gene as reported in Cancer Gene Census. See abbreviations here.||T-PLL|
|77||CGC_Other_Diseases||Other diseases/syndromes with alterations in this gene as reported in Cancer Gene Census.||type=REGULATORY REGION|TFbs=CTCF|Dataset=CTCF ChIP-chip sites (Ren lab)|
|78||DNARepairGenes_Role||Known DNA repair roles for this gene as reported in Wood et al.||NER|Involved_in_tolerance_or_repair_of_DNA_crosslinks|
|79||FamilialCancerDatabase_Syndromes||Familial cancer syndromes with alteration in this gene as reported in the Familial Cancer Database.||Wiskott-Aldrich_syndrome|
|80||MUTSIG_Published_Results||Published MutSig analyses with gene in signifcant results. Gene rank and q-value are provided in parentheses.||TCGA GBM(2;<1E-8)|TSP Lung(26;0.18)|
A REST-like interface is available for obtaining detailed annotataions in JSON format for genes, transcripts, and mutations.
Provide "chr", "start", and "end" parameters delimited by an underscore character ("_").
Provide "chr", "start", and "end" parameters delimited by an underscore character ("_").
Provide "chr", "start", "end", "reference_allele", and "observed_allele" parameters delimited by an underscore character ("_").