Tagged with #variantannotator
2 documentation articles | 1 announcement | 44 forum discussions



Created 2015-08-18 19:35:06 | Updated 2016-04-13 12:02:58 | Tags: haplotypecaller variantannotator annotation mutect2

Comments (0)

The problem

You specified -A <some annotation> in a command line invoking one of the annotation-capable tools (HaplotypeCaller, MuTect2, UnifiedGenotyper and VariantAnnotator), but that annotation did not show up in your output VCF.

Keep in mind that all annotations that are necessary to run our Best Practices are annotated by default, so you should generally not need to request annotations unless you're doing something a bit special.

Why this happens & solutions

There can be several reasons why this happens, depending on the tool, the annotation, and you data. These are the four we see most often; if you encounter another that is not listed here, let us know in the comments.

  1. You requested an annotation that cannot be calculated by the tool

    For example, you're running MuTect2 but requested an annotation that is specific to HaplotypeCaller. There should be an error message to that effect in the output log. It's not possible to override this; but if you believe the annotation should be available to the tool, let us know in the forum and we'll consider putting in a feature request.

  2. You requested an annotation that can only be calculated if an optional input is provided

    For example, you're running HaplotypeCaller and you want InbreedingCoefficient, but you didn't specify a pedigree file. There should be an error message to that effect in the output log. The solution is simply to provide the missing input file. Another example: you're running VariantAnnotator and you want to annotate Coverage, but you didn't specify a BAM file. The tool needs to see the read data in order to calculate the annotation, so again, you simply need to provide the BAM file.

  3. You requested an annotation that has requirements which are not met by some or all sites

    For example, you're looking at RankSumTest annotations, which require heterozygous sites in order to perform the necessary calculations, but you're running on haploid data so you don't have any het sites. There is no workaround; the annotation is not applicable to your data. Another example: you requested InbreedingCoefficient, but your population includes fewer than 10 founder samples, which are required for the annotation calculation. There is no workaround; the annotation is not applicable to your data.

  4. You requested an annotation that is already applied by default by the tool you are running

    For example, you requested Coverage from HaplotypeCaller, which already annotates this by default. There is currently a bug that causes some default annotations to be dropped from the list if specified on the command line. This will be addressed in an upcoming version. For now the workaround is to check what annotations are applied by default and NOT request them with -A.


Created 2012-09-19 18:45:35 | Updated 2013-08-23 22:00:11 | Tags: unifiedgenotyper variantannotator intermediate

Comments (0)

As featured in this forum question.

Two main things account for these kinds of differences, both linked to default behaviors of the tools:

  • The tools downsample to different depths of coverage

  • The tools apply different read filters

In both cases, you can end up looking at different sets or numbers of reads, which causes some of the annotation values to be different. It's usually not a cause for alarm. Remember that many of these annotations should be interpreted relatively, not absolutely.


Created 2014-07-15 03:54:06 | Updated 2014-10-23 17:58:36 | Tags: variantrecalibrator haplotypecaller selectvariants variantannotator release-notes catvariants genotypegvcfs

Comments (13)

GATK 3.2 was released on July 14, 2014. Itemized changes are listed below. For more details, see the user-friendly version highlights.


We also want to take this opportunity to thank super-user Phillip Dexheimer for all of his excellent contributions to the codebase, especially for this release.


Haplotype Caller

  • Various improvements were made to the assembly engine and likelihood calculation, which leads to more accurate genotype likelihoods (and hence better genotypes).
  • Reads are now realigned to the most likely haplotype before being used by the annotations, so AD and DP will now correspond directly to the reads that were used to generate the likelihoods.
  • The caller is now more conservative in low complexity regions, which significantly reduces false positive indels at the expense of a little sensitivity; mostly relevant for whole genome calling.
  • Small performance optimizations to the function to calculate the log of exponentials and to the Smith-Waterman code (thanks to Nigel Delaney).
  • Fixed small bug where indel discovery was inconsistent based on the active-region size.
  • Removed scary warning messages for "VectorPairHMM".
  • Made VECTOR_LOGLESS_CACHING the default implementation for PairHMM.
  • When we subset PLs because alleles are removed during genotyping we now also subset the AD.
  • Fixed bug where reference sample depth was dropped in the DP annotation.

Variant Recalibrator

  • The -mode argument is now required.
  • The plotting script now uses the theme instead of opt functions to work with recent versions of the ggplot2 R library.

AnalyzeCovariates

  • The plotting script now uses the theme instead of opt functions to work with recent versions of the ggplot2 R library.

Variant Annotator

  • SB tables are created even if the ref or alt columns have no counts (used in the FS and SOR annotations).

Genotype GVCFs

  • Added missing arguments so that now it models more closely what's available in the Haplotype Caller.
  • Fixed recurring error about missing PLs.
  • No longer pulls the headers from all input rods including dbSNP, rather just from the input variants.
  • --includeNonVariantSites should now be working.

Select Variants

  • The dreaded "Invalid JEXL expression detected" error is now a kinder user error.

Indel Realigner

  • Now throws a user error when it encounters reads with I operators greater than the number of read bases.
  • Fixed bug where reads that are all insertions (e.g. 50I) were causing it to fail.

CalculateGenotypePosteriors

  • Now computes posterior probabilities only for SNP sites with SNP priors (other sites have flat priors applied).
  • Now computes genotype posteriors using likelihoods from all members of the trio.
  • Added annotations for calling potential de novo mutations.
  • Now uses PP tag instead of GP tag because posteriors are Phred-scaled.

Cat Variants

  • Can now process .list files with -V.
  • Can now handle BCF and Block-Compressed VCF files.

Validate Variants

  • Now works with gVCF files.
  • By default, all strict validations are performed; use --validationTypeToExclude to exclude specific tests.

FastaAlternateReferenceMaker

  • Now use '--use_IUPAC_sample sample_name' to specify which sample's genotypes should be used for the IUPAC encoding with multi-sample VCF files.

Miscellaneous

  • Refactored maven directories and java packages replacing "sting" with "gatk".
  • Extended on-the-fly sample renaming feature to VCFs with the --sample_rename_mapping_file argument.
  • Added a new read transformer that refactors NDN cigar elements to one N element.
  • Now a Tabix index is created for block-compressed output formats.
  • Switched outputRoot in SplitSamFile to an empty string instead of null (thanks to Carlos Barroto).
  • Enabled the AB annotation in the reference model pipeline (thanks to John Wallace).
  • We now check that output files are specified in a writeable location.
  • We now allow blank lines in a (non-BAM) list file.
  • Added legibility improvements to the Progress Meter.
  • Allow for non-tab whitespace in sample names when performing on-the-fly sample-renaming (thanks to Mike McCowan).
  • Made IntervalSharder respect the IntervalMergingRule specified on the command line.
  • Sam, tribble, and variant jars updated to version 1.109.1722; htsjdk updated to version 1.112.1452.

Created 2016-03-14 16:28:57 | Updated | Tags: variantannotator non-model-organisms-snpeff

Comments (1)

Hi All,

I am working on a non-model organism and have finished the variant calling process using GATK. I have a VCF file containing variants and my next task is to see the effect of those mutations on the genome. I have been looking at several threads for the same and could not quite grasp what was being written. How should I proceed? @Geraldine_VdAuwera @Sheila


Created 2016-03-07 22:53:40 | Updated 2016-03-07 23:01:14 | Tags: homopolymerrun variantannotator

Comments (7)

Hi,

I have one deletion that causes VariantAnnotator to throw an exception when trying to add the HomopolymerRun annotation. Here is the VCF from HaplotypeCaller that I am using:

##fileformat=VCFv4.1
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample1 sample2 sample3
chr18   44383010    .   AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAGCC A   609.61  .   AC=2;AF=0.333;AN=6;BaseQRankSum=-1.949;DP=55;FS=3.699;MLEAC=2;MLEAF=0.333;MQ=58.29;MQ0=0;MQRankSum=-2.245;QD=14.87;ReadPosRankSum=-0.492;SOR=0.180  GT:AD:GQ:PL 0/1:3,12:89:493,0,89    0/0:4,0:11:0,11,175 0/1:4,4:99:157,0,155

And here is the Error:

/usr/local/biotools/java/jdk1.7.0_67/bin/java -Xmx4g -Xms2g -Djava.io.tmpdir=./ -jar /data5/bsi/bictools/alignment/gatk/3.5/GenomeAnalysisTK.jar -T VariantAnnotator -R /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa -V problem.vcf -o variantshrun.vcf -A HomopolymerRun -L region.bed
INFO  16:41:57,116 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:41:57,118 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 
INFO  16:41:57,118 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  16:41:57,118 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  16:41:57,122 HelpFormatter - Program Args: -T VariantAnnotator -R /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa -V problem.vcf -o variantshrun.vcf -A HomopolymerRun -L region.bed 
INFO  16:41:57,128 HelpFormatter - Executing as m088341@franklin09.mayo.edu on Linux 2.6.32-573.18.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_67-b01. 
INFO  16:41:57,128 HelpFormatter - Date/Time: 2016/03/07 16:41:57 
INFO  16:41:57,129 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:41:57,129 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  16:41:57,590 GenomeAnalysisEngine - Strictness is SILENT 
INFO  16:41:57,691 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
WARN  16:41:57,735 RMDTrackBuilder - Index file problem.vcf.idx is out of date (index older than input file), deleting and updating the index file 
INFO  16:41:57,903 RMDTrackBuilder - Writing Tribble index to disk for file problem.vcf.idx 
INFO  16:41:57,922 IntervalUtils - Processing 5000 bp from intervals 
INFO  16:41:57,982 GenomeAnalysisEngine - Preparing for traversal 
INFO  16:41:57,982 GenomeAnalysisEngine - Done preparing for traversal 
INFO  16:41:57,983 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  16:41:57,983 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  16:41:57,983 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
INFO  16:41:59,062 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.ArrayIndexOutOfBoundsException: 101
    at org.broadinstitute.gatk.tools.walkers.annotator.HomopolymerRun.computeIndelHomopolymerRun(HomopolymerRun.java:155)
    at org.broadinstitute.gatk.tools.walkers.annotator.HomopolymerRun.annotate(HomopolymerRun.java:102)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:221)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:203)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:357)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:114)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: 101
##### ERROR ------------------------------------------------------------------------------------------

Has anyone encountered this error before? Or maybe this is a bug?

Thanks!


Created 2016-03-03 18:33:18 | Updated 2016-03-04 00:32:49 | Tags: variantannotator

Comments (2)

Hi, I generated vcf through samtools and bcftools with parsing some attributes. Input vcf is below:

##fileformat=VCFv4.1
##samtoolsVersion=0.1.19-44428cd
##reference=file://./resources/allchr.hg19.fa
##contig=<ID=chr1,length=249250621,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=1b22b98cdeb4a9304cb5d48026a85128>
##contig=<ID=chr2,length=243199373,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=a0d9851da00400dec1098a9255ac712e>
##contig=<ID=chr3,length=198022430,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=641e4338fa8d52a5b781bd2a2c08d3c3>
##contig=<ID=chr4,length=191154276,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=23dccd106897542ad87d2765d28a19a1>
##contig=<ID=chr5,length=180915260,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=0740173db9ffd264d728f32784845cd7>
##contig=<ID=chr6,length=171115067,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=1d3a93a248d92a729ee764823acbbc6b>
##contig=<ID=chr7,length=159138663,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=618366e953d6aaad97dbe4777c29375e>
##contig=<ID=chr8,length=146364022,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=96f514a9929e410c6651697bded59aec>
##contig=<ID=chr9,length=141213431,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=3e273117f15e0a400f01055d9f393768>
##contig=<ID=chr10,length=135534747,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=988c28e000e84c26d552359af1ea2e1d>
##contig=<ID=chr11,length=135006516,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=98c59049a2df285c76ffb1c6db8f8b96>
##contig=<ID=chr12,length=133851895,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=51851ac0e1a115847ad36449b0015864>
##contig=<ID=chr13,length=115169878,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=283f8d7892baa81b510a015719ca7b0b>
##contig=<ID=chr14,length=107349540,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=98f3cae32b2a2e9524bc19813927542e>
##contig=<ID=chr15,length=102531392,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=e5645a794a8238215b2cd77acb95a078>
##contig=<ID=chr16,length=90354753,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=fc9b1a7b42b97a864f56b348b06095e6>
##contig=<ID=chr17,length=81195210,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=351f64d4f4f9ddd45b35336ad97aa6de>
##contig=<ID=chr18,length=78077248,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=b15d4b2d29dde9d3e4f93d1d0f2cbc9c>
##contig=<ID=chr19,length=59128983,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=1aacd71f30db8e561810913e0b72636d>
##contig=<ID=chr20,length=63025520,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=0dec9660ec1efaaf33281c0d5ea2560f>
##contig=<ID=chr21,length=48129895,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=2979a6085bfe28e3ad6f552f361ed74d>
##contig=<ID=chr22,length=51304566,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=a718acaa6135fdca8357d5bfe94211dd>
##contig=<ID=chrX,length=155270560,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=7e0e2e580297b7764e31dbc80c2540dd>
##contig=<ID=chrY,length=59373566,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=1e86411d73e6f00a10590f976be01623>
##contig=<ID=chrM,length=16571,URL=file:/home/chihhsu/bin/eSNV-Detect_1.0/resources/./allchr.hg19.fa,md5=d2ed829b8a1628d16cbeee88e88e39eb>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=IS,Number=2,Type=Float,Description="Maximum number of reads supporting an indel and fraction of indel reads">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=QBD,Number=1,Type=Float,Description="Quality by Depth: QUAL/#reads">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Read Position Bias">
##INFO=<ID=MDV,Number=1,Type=Integer,Description="Maximum number of high-quality nonRef reads in samples">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data. Note: this version may be broken.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality non-reference bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Reference Allele Counts">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Alternative Allele Counts">
##FORMAT=<ID=RO,Number=1,Type=Float,Description="Ratio of RD and AD">
##FORMAT=<ID=MA,Number=1,Type=Integer,Description="Multiple Alternative Alleles">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA07347
chr22   17565974    .   G   C   0   .   .   DP:RD:AD:RO:MA  DP=6;RD=1;AD=5;RO=0.833;MA=0
==

I ran VariantAnnotator of GATK 3.5 with the command and got errors below.
==
/usr/bin/java -Xmx10g -Xms5g -Djava.io.tmpdir=.//temp -jar /home/chihhsu/bin/eSNV-Detect_1.0/bin/gatk/GenomeAnalysisTK.jar -R /home/chihhsu/bin/eSNV-Detect_1.0/resources/allchr.hg19.fa -T VariantAnnotator -I output_test_GATK3/bwa/variant/NA07347.bam -V ./NA07347.vcf --dbsnp /home/chihhsu/bin/eSNV-Detect_1.0/resources/dbsnp.vcf.gz -L ./NA07347.vcf --out ./NA07347.vcf.temp -A QualByDepth -A ReadPosRankSumTest
INFO  11:56:24,489 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:56:24,491 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 
INFO  11:56:24,491 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  11:56:24,492 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  11:56:24,496 HelpFormatter - Program Args: -R /home/chihhsu/bin/eSNV-Detect_1.0/resources/allchr.hg19.fa -T VariantAnnotator -I output_test_GATK3/bwa/variant/NA07347.bam -V ./NA07347.vcf --dbsnp /home/chihhsu/bin/eSNV-Detect_1.0/resources/dbsnp.vcf.gz -L ./NA07347.vcf --out ./NA07347.vcf.temp -A QualByDepth -A ReadPosRankSumTest 
INFO  11:56:24,505 HelpFormatter - Executing as chihhsu@torta.bcmt.bcm.edu on Linux 3.10.0-327.10.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_71-b15. 
INFO  11:56:24,506 HelpFormatter - Date/Time: 2016/03/03 11:56:24 
INFO  11:56:24,506 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:56:24,506 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:56:24,582 GenomeAnalysisEngine - Strictness is SILENT 
INFO  11:56:24,691 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250 
INFO  11:56:24,699 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  11:56:24,723 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
INFO  11:56:25,007 IntervalUtils - Processing 1497 bp from intervals 
WARN  11:56:25,010 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation 
INFO  11:56:25,079 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO  11:56:25,113 GenomeAnalysisEngine - Done preparing for traversal 
INFO  11:56:25,113 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  11:56:25,114 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
INFO  11:56:25,114 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
INFO  11:56:26,072 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.NumberFormatException: For input string: "DP=6;RD=1;AD=5;RO=0.833;MA=0"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.valueOf(Integer.java:766)
    at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:717)
    at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:128)
    at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
    at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
    at htsjdk.variant.variantcontext.GenotypesContext.iterator(GenotypesContext.java:465)
    at htsjdk.variant.variantcontext.VariantContext.validateGenotypes(VariantContext.java:1357)
    at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1299)
    at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:410)
    at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:496)
    at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:490)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:213)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:203)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:357)
    at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:114)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: For input string: "DP=6;RD=1;AD=5;RO=0.833;MA=0"
##### ERROR ------------------------------------------------------------------------------------------

I also ran ValidateVariants and there is no error found.

 java -jar ./bin/gatk/GenomeAnalysisTK.jar \
   -T ValidateVariants \
   -R ./resources/allchr.hg19.fa \
   -V ./NA07347.vcf \
   --validationTypeToExclude ALL

Successfully validated the input file.  Checked 1497 records with no failures.

Any suggestions? Thank you.


Created 2016-01-14 15:00:51 | Updated | Tags: allelebalance variantannotator coverage alleledepth

Comments (4)

Some calls I don't see allele depth (AD), for some I don't see depth (DP), and for some I see neither. Are there scenarios where GATK cannot annotate these?


Created 2015-11-25 09:07:33 | Updated | Tags: combinevariants fisherstrand qualbydepth variantannotator

Comments (1)

I am doing variant calling of multiple RNAseq datasets using GATK/3.4.46. For limitation of computational resources, I ran HaplotypeCaller on each dataset separately. Then I ran CombineVaraints to merge all output VCF files using this command

java -Xmx10g -jar GenomeAnalysisTK.jar \ -T CombineVariants \ -R $gatk_ref \ --variant set1.vcf \ --variant set2.vcf \ --variant set3.vcf \ -o combine_output.vcf \ -genotypeMergeOptions UNIQUIFY

Then I tried to run VariantFiltration using thic command

java -Xmx2g -jar $GATK/GenomeAnalysisTK.jar \ -T VariantFiltration \ -R $gatk_ref \ -V combine_output.vcf \ -window 35 -cluster 3 \ -filterName FS -filter "FS > 30.0" \ -filterName QD -filter "QD < 2.0" \ -o $output

Several thousands variants though warning for absence of FS and QD. According to @Sheila advise in http://gatkforums.broadinstitute.org/discussion/2334/undefined-variable-variantfiltration, I ran VariantAnnotator to add these annotations using this command

java -Xmx45g -jar GenomeAnalysisTK.jar \ -R $gatk_ref \ -T VariantAnnotator \ -I input1.bam \ -I input2.bam \ . . -I input57.bam \ -V combine_output.vcf \ -A Coverage \ -A FisherStrand \ -A QualByDepth \ -nt 7 \ -o combine_output_ann.vcf

Then I repeated the VariantFiltration but I have 2 problems: 1) about 2000 variants are still not annotated for FS. All of them are indels and many of them are not homozygous for the ALT allele). Also ~ 40 variants are still not annotated for QD. All of them have multiple ALT alleles 2) The combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value. According to my manual calculations, I think VariantAnnotator calculates the QD value by dividing this QUAL value by the AD of samples with a non hom-ref genotype call. This cause many variants to fail the QD filter.

Thank you


Created 2015-11-06 11:07:22 | Updated | Tags: selectvariants variantannotator variantfiltration dbsnp

Comments (5)

Hi team,

Prior to submission to NCBI dbSNP a vcf generated by e.g HaplotypeCaller requires several modifications:

  1. Addition of in-house identifiers. .................................................... done
  2. Exclude if alternate allele is "*" i.e. they are in a deletion. .................................................... I'm assuming this can be done with SelectVariants or FilterVariants.
  3. Exclude if ref or alt allele is greater than 50bp .................................................... Perhaps with SelectVariants or FilterVariants --maxIndelSize 50
  4. Exclude if ref and alt alleles do not have a common leading base. .................................................... Not sure ... removing larger indels won't exclude all of these.
  5. Add VRT (variant type) to Info field .....................................................e.g VRT=1 (for an SNV), VRT=2 for an indel etc. SNPeff doesn't seem to work for this but I could be wrong.

Knowing how to effectively format vcfs between GATK output and NCBI input might be quite useful for many people, and save rather a lot of time.

It would be really useful if the three exclusion criteria could be done using GATK. Is this possible and using what commands ?

I feel as though need to use the GATK variantAnnotator command as well. I'm looking into all of this today, and will post if I get any solutions.

Sincerely,

William Gilks


Created 2015-10-30 16:22:16 | Updated | Tags: variantannotator bug

Comments (8)

Hi, I am having a problem with VariantAnnotator.

I am running the command: java1.7 -jar /usr/local/packages/GATK3/GenomeAnalysisTK.jar \ -R /projects4/ruth/Burkholderia/cutadapt/cenocepacia/B_cenocepacia_J2315.fasta \ -T VariantAnnotator \ -I N501.C8967_R1.fastq_to_B_cenocepacia_J2315.sorted.RG.bam \ -o output2.vcf \ -V N501.C8967_R1.fastq_to_B_cenocepacia_J2315.sorted_GATK.vcf \ -A AlleleBalance \ -A BaseCounts \ -A Coverage \ -A FisherStrand \ -A GenotypeSummaries \ -A LowMQ \ -A RMSMappingQuality \ -A AlleleBalanceBySample

Where the .vcf file was made using GATK HaplotypeCaller.

The error I get is:

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 0 at org.broadinstitute.gatk.tools.walkers.annotator.AlleleBalanceBySample.annotateWithPileup(AlleleBalanceBySample.java:127) at org.broadinstitute.gatk.tools.walkers.annotator.AlleleBalanceBySample.annotate(AlleleBalanceBySample.java:113) at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateGenotypes(VariantAnnotatorEngine.java:420) at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:216) at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:192) at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:312) at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.map(VariantAnnotator.java:85) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: 0
ERROR ------------------------------------------------------------------------------------------

Interestingly the error does not occur when I run exactly the same command with a vcf file created using samtools (and the sam bam file).

I have installed the most up-to-date version of GATK (3.4-46) in case that was causing the error, but that does not seem to be the answer.

Any suggestions about what is causing this error would be greatly appreciated.

Thanks,

Ruth


Created 2015-10-04 03:36:09 | Updated | Tags: variantannotator

Comments (7)

Hi I am going through the GATK Best Practices pipeline for the first time. I have non-human data and am at the variant discovery step in the pipeline. I don't have well-vetted 'known' variant sites but have pieced together a vcf with high-likelihood snps that are in common among several data sets in our lab over the past year. The vcf has only four basic annotations and it looks like the software that produced it doesn't provide for generating others so I am trying to add a few with VariantAnnotator. My command is java -jar GenomeAnalysisTK.jar \ -R Oncorhynchus_mykiss_chr.fa \ -T VariantAnnotator \ -I Om2013all-rg.bam \ -A StrandOddsRatio \ -A MappingQualityRankSumTest \ -A ReadPosRankSumTest \ -o Om2013bElp.vcf \ -V Om2013bEl.vcf When I run it I get an error:

ERROR stack trace

java.lang.NumberFormatException: For input string: "." at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250) at java.lang.Double.parseDouble(Double.java:540) at htsjdk.variant.variantcontext.GenotypeLikelihoods.parseDeprecatedGLString(GenotypeLikelihoods.java:251) at htsjdk.variant.variantcontext.GenotypeLikelihoods.fromGLField(GenotypeLikelihoods.java:81) at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:715) at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:128) at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158) at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:347) at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257) at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:60) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:79) at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:41) at htsjdk.tribble.AbstractFeatureCodec.decodeLoc(AbstractFeatureCodec.java:40) at htsjdk.tribble.index.IndexFactory$FeatureIterator.readNextFeature(IndexFactory.java:502) at htsjdk.tribble.index.IndexFactory$FeatureIterator.(IndexFactory.java:403) at htsjdk.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:312) at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:401) at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:287) at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:224) at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:147) at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208) at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1047) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:828) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: For input string: "."
ERROR ------------------------------------------------------------------------------------------

I've read everything I can find on VariantAnnotator on the GATK website but haven't found anything helpful yet. I am running The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12.

I suspect that I am doing something wrong but can't track it down. Any ideas about what is or isn't going on?

Thanks, Sewall


Created 2015-09-09 21:56:12 | Updated 2015-09-09 21:56:49 | Tags: inbreedingcoeff variantannotator mqranksum baseqranksum

Comments (7)

Hi @Team,

I found that VariantAnnotator sometimes does not annotate some annotations that are requested.

A ) The Rank Sum Test annotations MQRankSum & BaseQRankSum I was not able to identify the requirements that have to be met, so they are being calculated for a variant.

B ) InbreedingCoeff This one seems to be connected to the number of total called alleles (AN). For me there needed to be at least 10% alleles be called (19/186). The doc for that says [1] "at least 10 founder samples". Maybe this has to be updated to 10%?

These are the ones I observed. Can someone tell me more about that?

Thanks, Alexander

[1] https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php


Created 2015-08-19 20:17:52 | Updated | Tags: allelebalancebysample variantannotator

Comments (2)

Hi GATK Team,

I am trying to have AlleleBalanceBySample for each sample for a pre-generate vcf file. But I am not getting it. Bellow is my command. Am I doing something wrong??

java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R ../Genome/hg19.fa --variant all.vcf -o annotate.vcf -A AlleleBalanceBySample -L exome.interval.list

I am having it in the heading lines

FORMAT=

But not in the formate field. GT:AD:DP:GQ:PL


Created 2015-07-20 21:37:21 | Updated | Tags: variantannotator

Comments (4)

Hello, I'm trying to annotate using VariantAnnotator, when I run:

java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R ucsc.hg19.fasta --variant LK8851-Clean.vcf --resource LK8851-Clean-snpEff.vcf -L LK8851-Clean.vcf -o LK8851-Clean-VA.vcf

I get the following error: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data

One of the variant lines:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1 14907 rs79585140 A G 661.53 MQFilter40 . .

I downloaded the vcf from https://my.pgp-hms.org/ and had to add the "." in the INFO and FORMAT columns because I was getting the error:

Line 128: there aren't enough columns for line 1 14930 rs75454623 AG 1103.37 MQFilter40 . (we expected 9 tokens, and saw 8 )

I'm new to this, I really searched a lot, but couldn't find anything useful, probably a very stupid error, but I would really appreciate your help.

Thank you for your time, Maxi


Created 2015-06-24 14:28:42 | Updated | Tags: variantannotator vcf

Comments (5)

Hi! So we use GATK a lot in our research, it works amazingly well most of the time, so first of all, thanks for creating it!

We have this one problem that we were unable to solve on our own. Say we have a VCF file that contains called variants, and we want to annotate it using an external database, clinvar as one example. We used to use VariantAnnotator for this purpose until we found out (both by reading documentation and doing a quick experimental check) that it annotates variants based solely on position, ignoring the actual mutation that happened. Imagine for example that variant A → C was called at a specific position, but data for A → G is recorded at clinvar. In this case VariantAnnotator will still carry over INFO fields from clinvar into our VCF. We ideally do not want this to happen, because strictly speaking clinvar data was recorded for a completely different mutation and might not be relevant at all in our case.

My question: is there an option for VariantAnnotator to make it check REF and ALT fields in the process of annotation? (Although I fear it wouldn't be possible because it uses RodWalker class to traverse the variants.) Or, alternatively, can this be achieved using combination of other GATK commands? Or will we have to write a custom walker to accomplish what we want? (The latter is obviously the worst case, but hopefully we can manage that.)

All the best, Kirill


Created 2015-04-09 18:55:24 | Updated | Tags: allelebalancebysample variantannotator genotypegvcfs strandbiasbysample

Comments (5)

I am struggling with adding AlleleBalanceBySample and StrandBiasBySample to my joint VCF file. The data through joint VCF were generated with GATK 3.2-2, and I have tried using the VariantAnnotator as well as repeating the GenotypeGVCF step specifying the annotations. I have also tried using the GenotypeGVCFs tool from 3.3-0, but am trying to avoid running the HC again in the interest of time. For the strand bias by sample I can see why the VariantAnnotator approach would fail, as the information is not contained in the joint VCF file, but it is there in the GVCF. For the allele balance by sample it seems the info is already in the joint VCF, so either VariantAnnotation or rerunning GenotypeGVCFs seem like they should work.

I should add that when I do try to add these annotations using GenotypeGVCFs, they are noted in the header ##FORMAT section, but not in the actual F:O:R:M:A:T.

Any clues or hints (or admonitions!) much appreciated.

-erikt


Created 2015-03-31 14:53:19 | Updated | Tags: variantannotator

Comments (20)

Hello GATK Team,

I am following a workflow that assumes the user is working with GATK 2.2-16. I am, however, using the current GATK version. Curious, is -V:variant,VCF vcfFile still a valid way to call a command when using VariantAnnotator on the latest GATK release?

Thanks!


Created 2015-02-11 22:56:25 | Updated | Tags: vqsr variantannotator

Comments (3)

Hi,

In the best practices for vqsr in indel mode it is recommended to use the annotation SOR. However, when I try to add this annotation using VariantAnnotator it only adds it to the SNP calls not the indel calls. Does this mean SOR should not be used for vqsr in indel mode?

Thanks,

Kath


Created 2015-02-05 20:47:57 | Updated | Tags: variantannotator annotation clinvar

Comments (2)

Hi,

I'm using VariantAnnotator to add annotations to variants from a bunch of sources. One issue that I have is that for some variants, there are multiple annotations in a supplied resource. In the docs, I read

"Note that if there are multiple records in the resource file that overlap the given position, one is chosen randomly."

Can this behaviour be altered? I need to output all annotations for a record, either on a single line, or on multiple.

In the case i'm working on, one line has the annotation "CLNSIG=5" (i.e. a known pathogenic variant) and the other (likely older record) is "CLNSIG=1" i.e. a variant of unknown significance. I need to output both so I can filter downstream (using SelectVariants) to select those where "CLNSIG=5".

cheers


Created 2015-01-09 17:51:26 | Updated | Tags: variantannotator multi-sample

Comments (4)

Hi,

I used the variant annotator on a multi sample vcf and I am happy to report that it worked great (especially being tree reducible). I had a question that I could not find in the documentation. For each variant call it the annotator would either declare a sample to be a "hiConfDeNovo" or a "loConfDeNovo" or there would be no assessment whatsover (indicating I suspect no De Novos for that variant call).

My question: What defines a "hiConfDeNovo" and what defines a "loConfDeNovo" as it is used by the annotator. If there was some documentation I had overlooked please let me know, otherwise any insight is much appreciated.

Thank you as always


Created 2015-01-08 20:01:14 | Updated | Tags: variantannotator java-lang-reflect-invocationtargetexception

Comments (3)

Hi, when I'm trying to use the VariantAnnotator on larger SNPEFF vcf files (about 40-50MB) I get the same error: "stack trace" followed by "java.lang.reflect.InvocationTargetException" using GATK 3.2.

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:189) at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.loadFromDisk(RMDTrackBuilder.java:336) at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.attemptToLockAndLoadIndexFromDisk(RMDTrackBuilder.java:320) at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:279) at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:225) at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:148) at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208) at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:990) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:772) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:285) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at htsjdk.tribble.index.IndexFactory.loadIndex(IndexFactory.java:185) ... 14 more Caused by: java.io.EOFException at htsjdk.tribble.util.LittleEndianInputStream.readFully(LittleEndianInputStream.java:138) at htsjdk.tribble.util.LittleEndianInputStream.readLong(LittleEndianInputStream.java:80) at htsjdk.tribble.index.linear.LinearIndex$ChrIndex.read(LinearIndex.java:271) at htsjdk.tribble.index.AbstractIndex.read(AbstractIndex.java:363) at htsjdk.tribble.index.linear.LinearIndex.(LinearIndex.java:101) ... 19 more

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: java.lang.reflect.InvocationTargetException
ERROR ------------------------------------------------------------------------------------------

The original command was:

$java -Xmx$jmem -jar $gatk_dir/GenomeAnalysisTK.jar -T VariantAnnotator -l DEBUG \
    -R $genome \
    -A SnpEff \
    --variant $resultsDir/$hq_vcf \
    --snpEffFile $resultsDir/${hq_vcf/.vcf/.snpEff.vcf} \
    -L $resultsDir/${hq_vcf/.vcf/.snpEff.vcf} \
    -o $resultsDir/${hq_vcf/.vcf/.variantCalls.snpEff.va.vcf} \
    -rf BadCigar

Where I added already the -rf BadCigar argument with no effect. I rewrote the indexes for the snpEff vcf files used as input, with igvtools - no effect. $jmem was set to '11G'. I can open all vcf files with an text editor - so it seems that there is no file corruption. Is there a different way to combine reads other than with bedtools?

Your help is very appreciated,

Philipp


Created 2014-12-11 15:57:03 | Updated | Tags: haplotypecaller variantannotator

Comments (4)

Hello

I am using GATK 3.3-0 HaplotypeCaller for variant calling. When I run HaplotypeCaller with the command

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R all_chr_reordered.fa -I 30_S30.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 50 -o 30_S30_control1.vcf -L brca.intervals

I get all the variants I want, however, I also want to get the number of forward and reverse reads that support REF and ALT alleles. Therefore I use StrandBiasBySample annotation when running HaplotypeCaller with the command:

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R all_chr_reordered.fa -I 30_S30.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 50 -o 30_S30_control2.vcf -L brca.intervals -A StrandBiasBySample

The SB field is added, but a variant that was in 30_S30_control1.vcf is absent in 30_S30_control2.vcf. All the remaining variants are there. The only difference between two variant calls was adding -A StrandBiasBySample. What I'm wondering about is that why this one variant is absent.

the missing variant: 17 41276152 . CAT C 615.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.147;ClippingRankSum=0.564;DP=639;FS=15.426;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-1.054;QD=0.96;ReadPosRankSum=0.698;SOR=2.181 GT:AD:DP:GQ:PL 0/1:565,70:635:99:653,0,18310

So I decided to run HaplotypeCaller without -A StrandBiasBySample and later add the annotations with VariantAnnotator. Here is the command:

java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R all_chr_reordered.fa -I 30_S30.bam --variant 30_S30_control1.vcf -L 30_S30_control1.vcf -o 30_S30_control1_SBBS.vcf -A StrandBiasBySample

However, the output vcf file 30_S30_control1_SBBS.vcf was not different from the input variant file 30_S30_control1.vcf except for the header, SB field wasn't added. Why was the SB field not added? Is there any other way to get the number of forward and reverse reads?

Please find 30_S30_control1.vcf, 30_S30_control2.vcf and 30_S30_control1_SBBS.vcf in attachment

Thanks


Created 2014-11-14 03:30:08 | Updated | Tags: variantannotator genotypegvcfs

Comments (4)

I have constructed a VCF using GenotypeGVCFs. Due to downsampling and filtering imposed by HaplotypeCaller/GenotypeGVCFs, four fields of interest within the VCF do not reflect the data that went into building the file. I would like to update a these fields to reflect all read data used in the pipeline, but I have been able to do so for just 3 of the 4. I have been able to update INFO MQ0, INFO DP, and sample-level (FORMAT) AD, but I have not been able to update sample-level (FORMAT) DP.

I have run VariantAnnotator as follows.

inVCFfn=../vcf/raw.vcf
outVCFfn=../vcf/raw.annotated.vcf

gatk=/srv/gs1/software/gatk/gatk-3.3.0/GenomeAnalysisTK.jar
ref=/srv/gs1/projects/scg/Resources/GATK/b37/human_g1k_v37_decoy.fasta
dbsnp=/srv/gs1/projects/scg/Resources/GATK/b37/dbsnp_137.b37.vcf

java -Xmx10g \
  -jar $gatk \
  -T VariantAnnotator \
  -R $ref \
  -L $inVCFfn \
  --variant $inVCFfn \
  --dbsnp $dbsnp \
  --out $outVCFfn \
  --annotation Coverage \
  --annotation DepthPerAlleleBySample \
  --annotation MappingQualityZero \
  [-I fileName.1.bam … -I fileName.810.bam]

The documentation for --annotation Coverage indicates that it should update both the INFO DP and the sample-level (FORMAT) DP, but the sample-level update seems not to be occurring. For example:

$ awk '($2==7157834) { print $9, "|", $53 }' raw.vcf
GT:AD:DP:GQ:PL | 0:4,0:4:99:0,135

$ awk '($2==7157834) { print $9, "|", $53 }' raw.annotated.vcf
GT:AD:DP:GQ:PL | 0:131,0:4:99:0,135

(Aside: I'm working with a haploid chromosome, hence the single-value GT and two-value AD and PL fields). Although AD was updated from "4,0" to "131,0", the sample-level "DP" remained at "4". Of course I could approximate DP as the sum of AD values, but since AD is pre-filter and sample-level DP is post-filter, this wouldn't quite be correct. I should note that when I run HaplotypeCaller/GenotypeGVCFs on a subset of the data, the sample-level DP values are very close to the sum of AD's, so it's not a matter of most reads being filtered out, but rather that they have been downsampled. Am I missing something?

Thanks! David

P.S. The INFO field DP and MQ0 are indeed updated as expected:

$ awk '($2==7157834) { print $8 }' raw.vcf
AC=1;AF=6.993e-03;AN=143;DP=609;FS=0.000;GQ_MEAN=124.92;GQ_STDDEV=113.68;MLEAC=1;MLEAF=6.993e-03;MQ=60.00;MQ0=0;NCC=0;QD=30.16;SOR=1.259

$ awk '($2==7157834) { print $8 }' raw.annotated.vcf
AC=1;AF=6.993e-03;AN=143;DP=10750;FS=0.000;GQ_MEAN=124.92;GQ_STDDEV=113.68;MLEAC=1;MLEAF=6.993e-03;MQ=60.00;MQ0=2;NCC=0;QD=30.16;SOR=1.259

Created 2014-10-01 19:58:24 | Updated | Tags: snpeff variantannotator

Comments (2)

Hello, does anyone know where to download snpEff version 2.0.5? I could only find recent version in my search, which are not compatible with gatk.

Thanks,

Ricky


Created 2014-09-02 02:32:47 | Updated | Tags: snpeff variantannotator

Comments (10)

Hi, I first ran snpEff using the command: java -jar snpEff.jar eff -v -i vcf -o gatk phased.vcf > snpEff_output.vcf

which gave me the snpEff_output.vcf file that looks something like this:

Chr1 7050 . T A 22.76 LowQual AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=11.38 G T:AD:DP:GQ:PL 1/1:0,2:2:6:50,6,0

Chr1 7328 . A G 21.77 LowQual AC=2;AF=1.00;AN=2;DP=3;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=21.77;EFF= synonymous_variant(LOW|SILENT|tcA/tcG|p.Ser188Ser/c.564A>G|702|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.1|5|1),synonymous_va riant(LOW|SILENT|tcA/tcG|p.Ser188Ser/c.564A>G|616|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.2|5|1) GT:AD:DP:GQ:PL 1/1:0,3 :3:6:49,6,0

Chr1 9055 . T A 23.76 LowQual AC=2;AF=1.00;AN=2;DP=2;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=23.76 G T:AD:DP:GQ:PL 1/1:0,2:2:6:51,6,0

Chr1 9233 . T C 45.28 LowQual AC=2;AF=1.00;AN=2;DP=3;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=22.64;EFF= synonymous_variant(LOW|SILENT|aaT/aaC|p.Asn539Asn/c.1617T>C|616|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.2|9|1),synonymous_v ariant(LOW|SILENT|aaT/aaC|p.Asn539Asn/c.1617T>C|702|LOC_Os01g01010|protein_coding|CODING|LOC_Os01g01010.1|9|1) GT:AD:DP:GQ:PL 1/1:0,3 :3:9:73,9,0

etc.

Then I ran VariantAnnotator using the command: java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R genome.fasta -A SnpEff --variant phased.vcf --snpEffFile snpEff_output.vcf -o annotated.vcf

but it gave me a warning for every effect added by snpEff that look like this (the annotated.vcf file was non-empty though):

WARN 20:00:30,645 SnpEff - Skipping malformed SnpEff effect field at ChrSy:525610. Error was: "synonymous_variant is not a recognized effect type". Field was: "synonymous_variant(LOW|SILENT|caG/caA|p.Gln171Gln/c.513G>A|ChrSy.fgenesh.gene.77|protein_coding|CODING|ChrSy.fgenesh.mRNA.77|2|WARNING_TRANSCRIPT_INCOMPLETE)" WARN 20:00:30,649 SnpEff - Skipping malformed SnpEff effect field at ChrSy:537868. Error was: "missense_variant is not a recognized effect type". Field was: "missense_variant(LOW|MISSENSE|Gat/Aat|p.Asp298Asn/c.892G>A|ChrSy.fgenesh.gene.79|protein_coding|CODING|ChrSy.fgenesh.mRNA.79|5|WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS)" WARN 20:00:30,650 SnpEff - Skipping malformed SnpEff effect field at ChrSy:538754. Error was: "stop_gained is not a recognized effect type". Field was: "stop_gained(LOW|NONSENSE|Cag/Tag|p.Gln137*/c.409C>T|ChrSy.fgenesh.gene.79|protein_coding|CODING|ChrSy.fgenesh.mRNA.79|3|WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS)" WARN 20:00:30,650 SnpEff - Skipping malformed SnpEff effect field at ChrSy:538769. Error was: "missense_variant is not a recognized effect type". Field was: "missense_variant(LOW|MISSENSE|Gac/Aac|p.Asp132Asn/c.394G>A|ChrSy.fgenesh.gene.79|protein_coding|CODING|ChrSy.fgenesh.mRNA.79|3|WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS)"

What am In doing wrong? Clearly, snpEff outputs the effects in terms/words that GATK does not accept. Please help. Best, nb


Created 2014-06-10 20:57:36 | Updated | Tags: variantannotator best-practices vcf developer performance walkers java gatk gatk-walkers

Comments (4)

Hi,

I was wondering if there is a nice way to apply multiple processing steps to each variant (or a group of variants) as they are read so that the variant file is not read again and again. My understanding is that even if I use Queue, each script would read the vcf again. Is that correct?


Created 2014-04-28 15:11:17 | Updated | Tags: allelebalance allelebalancebysample haplotypecaller variantannotator

Comments (15)

Version 3.1.1. Human normal samples.

I couldnt find AlleleBalance and AlleleBalanceBySample tags in my vcf outputs. Tags are not found even for single variant I tried HaplotypeCaller with -all or directly with -A AlleleBalance or -A AlleleBalanceBySample. Also I tried Variantannotator with -all or -A AlleleBalance or -A AlleleBalanceBySample.

Any help will be apreciated


Created 2014-03-21 20:00:23 | Updated 2014-03-21 20:22:23 | Tags: variantannotator combinegvcfs haplotypcaller

Comments (3)

GATK Team,

First of all, I'm very excited about v3.x GATK, you guys and gals on the GATK Team are doing awesome work!

I have a question: How do I get CombineGVCFs to forward the sample-level annotations calculated by HC with the --emitRefConfidence GVCF option enabled? I tried passing the same annotation flags to CombineGVCFs that I passed to HC, but none of the annotations beyond the standard set were incorporated. Annotations such as StrandBiasBySample require access to the underlying reads and forwarding these annotations to the CombineGVCFs output would be less prohibitive than running VariantAnnotator on my 20 WGS datasets...


Created 2014-02-10 19:57:41 | Updated 2014-02-10 20:24:46 | Tags: variantannotator variantstovcf variantfiltration vcf variant-calling minor-allele-frequency

Comments (5)

Hi,

I have annotated my vcf file of 20 samples from Unified genotyper using the following steps.

Unified genotyper->Variantrecalibration->Applyrecalibration->VariantAnnotator

My question is how should I proceed if I have to select rare variants (MAF<1%) for the candidate genes that I have,for each of these 20 samples?


Created 2013-09-13 04:59:08 | Updated | Tags: variantannotator

Comments (5)

Hi, I have a tiny, 5000-line VCF file that I want to add dbSNP annotations to. I'm surprised to see VariantAnnotator iterating along the millions of records in the dbSNP file, rather than the 5000 variants in the input VCF file. This will take 42mins on dbsnp 137...

Am I misunderstanding how this tool works, or just using it wrongly?

thanks, Mark

java -Xmx2g -jar /share/ClusterShare/software/contrib/gi/gatk/2.5/dist/GenomeAnalysisTK.jar -T VariantAnnotator -R /share/ClusterShare/biodata/contrib/gi/gatk-resource-bundle/2.5/hg19/ucsc.hg19.fasta --variant PG0000864-BLD_PGx_cleaned.vcf --dbsnp /share/ClusterShare/biodata/contrib/gi/gatk-resource-bundle/2.5/hg19/dbsnp_137.hg19.vcf --out PG0000864-BLD_PGx,GATK.vcf --validation_strictness SILENT


Created 2013-09-09 17:01:27 | Updated | Tags: variantannotator

Comments (5)

I am getting a strange error when running the VariantAnnonator - does anyone know what this is about?

I am running it as java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R ucsc.hg19.fasta --variant result.vcf -comp:COSMIC CosmicMutants.vcf -resource CosmicMutants.vcf -E resource.ID -alwaysAppendDbsnpId -o annotated.vcf

and the error is

WARN 17:22:19,963 AbstractVCFCodec - Allele detected with length 1133370 exceeding max size 1048576 at approximately line 78827, likely resulting in degraded VCF processing performance WARN 17:22:20,037 AbstractVCFCodec - Allele detected with length 1133370 exceeding max size 1048576 at approximately line 78828, likely resulting in degraded VCF processing performance WARN 17:22:20,111 AbstractVCFCodec - Allele detected with length 1133370 exceeding max size 1048576 at approximately line 78827, likely resulting in degraded VCF processing performance ....

My vcf file is not that long...but I have looked in CosmicMutants.vcf, and it seems normal to me.


Created 2013-08-26 08:07:15 | Updated | Tags: snpeff variantannotator

Comments (1)

Hi, We use GATK with snpEff to annotate variants of our WES experiments. We use VariantAnnotator to keep only the effect with highest impact. We've realized that for some experiments it could be useful for us to customize priorities assignment (i.e. if a variant is categorized both as DOWNSTREAM for a gene and INTRON for the adjacent gene, the effect picked by default is DOWNSTREAM while we'd like to keep INTRON, etc.) Is there a way to modify VariantAnnotator default behaviour?


Created 2013-07-23 19:44:06 | Updated | Tags: variantannotator vcf genotype annotation

Comments (9)

Hi,

Could you tell me how to encourage GATK to annotate my genotype columns (i.e. add annotations to the FORMAT and PANC_R columns in the following file):

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PANC_R 
chrX 259221 . GA G 136.74 . AC=2;AF=1.00;AN=2;DP=15;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=8.82;MQ0=1;QD=3.04 GT:AD:GQ:PL 1/1:0,2:6:164,6,0

The file was generated with HaplotypeCaller. I used a command line similar to this one to no effect:

java -jar $GATKROOT/GenomeAnalysisTK.jarT VariantAnnotator -R hg19_random.fa -I chr7_recalibrated.bam -V chr7.vcf --dbsnpdbSNP135_chr.vcf -A Coverage -A QualByDepth -A FisherStrand -A MappingQualityRankSumTest -A ReadPosRankSumTest -o chr7_annotated-again.vcf

Does anyone have any suggestions? Thanks in advance!


Created 2013-07-03 23:33:40 | Updated | Tags: variantannotator

Comments (5)

It used to be possible to annotate a VCF files using multiple VCF track files. Basically the --comp command could be used multiple times. Now only the last track gets annotated. That means, if I use "-T VariantAnnotator --comp:TR1 tr1.vcf --comp:TR2 tr2.vcf, only the TR2 flag is used for annotation, and the TR1 flag is not. This is inconsistent and must be a recent regression.

It works fine with GenomeAnalysisTK-2.5-2 but it doesn't work anymore with GenomeAnalysisTK-2.6-2.


Created 2013-05-29 08:38:09 | Updated | Tags: variantannotator

Comments (1)

Hi,

I am writing a pipeline to analyse exome sequencing data from families and would like some advice on using the VariantAnnotator walker. It seems I need to use VariantAnnotator prior to VariantRecalibrator as the latter relies on annotations generated by VariantAnnotator. Is this correct? Also, I would like to annotate my VCF using SNPeff and SNPsift. Can you suggest the best order in which to perform these steps. For example, my plan at the moment is to run steps in the following order: HaplotypeCaller, VariantAnnotator, VariantRecalibrator, ApplyRecalibration, PhaseByTransmission, ReadBackedPhasing, SNPeff, SNPsift.

Any help would be much appreciated,

Kath


Created 2013-05-28 09:31:57 | Updated | Tags: variantannotator vcf multi-sample

Comments (2)

Hi,

I have a vcf containing multiple samples. I would like to put the bam files also as input for the Variant Annotator but how does the variant annotator know which bam is for wich column in the vcf? Does the order of the args of the bam files need to correspond to the order of the samples columns in the vcf?

Thanks,

Robin


Created 2013-05-06 15:19:46 | Updated | Tags: vqsr variantannotator

Comments (4)

Hi,

I just run HyplotypeCaller on a dataset. For the same dataset, I have run through Unified genotyper and then directly subjected the raw vcf from UG to VQSR step without the help of VariantAnnotator before and get through VQSR without any problem. However, when I try to subject the raw callset derived from HyplotypeCaller directly to VQSR step, the VQSR module complained about it and error message is below:

...

ERROR MESSAGE: Bad input: Values for HaplotypeScore annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See

http://gatkforums.broadinstitute.org/discussion/49/using-variant-annotator

So after HyplotypeCaller, the derived vcf file needs to run though VariantAnnotator? Since Unified genotyper derived callset does not need the help of VariantAnnotator (all annotations needed for VQSR are included after UG), it seems not the case for HyplotypeCaller? I can run through VariantAnnotator for HyplotypeCaller derived vcf file, just want to make sure if my understanding is correct?

Thanks and best

Mike


Created 2013-04-30 12:45:05 | Updated 2013-04-30 12:45:25 | Tags: variantrecalibrator combinevariants variantannotator annotations

Comments (4)

I have a set of VCFs with identical positions in them:

VCF1: 1 10097 . T . 26 . AN=196;DP=1622;MQ=20.06;MQ0=456 GT:DP

VCF2: 1 10097 . T . 21.34 . AN=198;DP=2338;MQ=19.53;MQ0=633 GT:DP

VCF3: 1 10097 . T . 11.70 . AN=240;DP=3957;MQ=19.74;MQ0=1085 GT:DP

VCF4: 1 10097 . T . 15.56 . AN=134;DP=1348;MQ=18.22;MQ0=442 GT:DP

If I use all of them as input for VariantRecalibrator, which annotations will VariantRecalibrator use? Should I instead merge the VCFs with CombineVariants and run VariantAnnotator, before I run VariantRecalibrator?

I'm not sure if the forum is for asking technical questions only or you are allowed to ask for best practices as well. Feel free to delete my question, if it doesn't belong here. Thank you.


Created 2013-04-25 20:34:16 | Updated | Tags: variantannotator vcf alleledepth

Comments (3)

Hello,

I am trying to filter some of my high-coverage samples based on a minimum depth and have found that the value stored in the DP INFO field and the AD genotype tag changes depending on whether or not I have run VariantAnnotator. The call I have used for VariantAnnotator is:

java -jar GenomeAnalysisTK.jar -T VariantAnnotator -R ucsc.hg19.fasta -I example.bam --variant example.raw.vcf --out example.annotated.vcf -G StandardAnnotation -L example.raw.vcf -rf BadCigar -dcov 15000

Here are the differences for some test cases with HaplotypeCaller:

No MarkDuplicates, did IndelRealigner & BQSR, nightly build 12/04/2013

Annotated: DP=2745, AD=4,2729 Raw: DP=957, AD=1,907

MarkDuplicates, IndelRealigner and BQSR, nightly build 12/04/2013

Annotated: DP=20, AD=0,20 Raw: DP=10, AD=0,8

Raw BAM, nightly build 12/04/2013

Annotated: DP=2745,AD=4,2729 Raw: DP=868, AD=1,864

Raw BAM, version 2.4-9

Annotated: DP=2745, AD=4,2729 Raw: DP=616, AD=1,611

I suspect what is happening here is that VariantAnnotator is taking the depth information from the provided BAM and replacing the depth information reported by the variant caller. Anyway, just wondering- which value is a better reflection of the depth used to make a given variant call? (i.e. which could I use in hard filtering?)

Thanks for your help!


Created 2013-01-25 00:41:19 | Updated | Tags: variantannotator

Comments (2)

I was trying to annotate a vcf file already run on snpEff on VariantAnnotator. Used snpEff v2.05 and GRCh37.64 to annotate the vcf as mentioned in the document. Was using this document for it http://gatkforums.broadinstitute.org/discussion/50/adding-genomic-annotations-using-snpeff-and-variantannotator

When trying to run it through VariantAnnotator, it flags the error "Invalid command line: Failed to load reference dictionary " along with few warnings Warnings : GenomeAnalysisTK-2.3-9-ge5ebf34/human_g1k_v37.dict: No locks available.

Can somebody help me on this

Thanks


Created 2012-12-16 22:27:37 | Updated 2012-12-16 22:28:40 | Tags: haplotypecaller validatevariants variantannotator fastareference vcf dbsnp hg19

Comments (7)

I ran the HaplotypeCaller, VariantAnnotator, and Variant Validatoor on chr3 locations from a human tumor sample.

The HaplotypeCaller command line is:

gatk="/usr/local/gatk/GenomeAnalysisTK-2.2-8-gec077cd/GenomeAnalysisTK.jar"
#Fasta from the gz in the resource bundle
indx="/home/ref/ucsc.hg19.fasta" 
dbsnp="/fdb/GATK_resource_bundle/hg19-1.5/dbsnp_135.hg19.vcf"

java -Xms1g -Xmx2g -jar $gatk -R ${indx} -T HaplotypeCaller \
 -I chrom_bams/286T.chr3.bam \
 -o hapc_vcfs/286T.chr3.raw.vcf 

The VariantAnnotator command line is:

java -Xms1g -Xmx2g -jar $gatk -R ${indx} -T VariantAnnotator \
     --dbsnp $dbsnp  --alwaysAppendDbsnpId \
    -A BaseQualityRankSumTest -A DepthOfCoverage \
    -A FisherStrand -A HaplotypeScore -A InbreedingCoeff \
    -A MappingQualityRankSumTest -A MappingQualityZero -A QualByDepth \
    -A RMSMappingQuality -A ReadPosRankSumTest -A SpanningDeletions \
    -A TandemRepeatAnnotator \
    --variant:vcf hapc_vcfs/286T.chr3.raw.vcf \
    --out varanno_vcfs/286T.chr3.va.vcf

This all works nicely, but I go back and use ValidateVariants just to be sure:

java -Xms1g -Xmx2g -jar $gatk -R ${indx} -T ValidateVariants \
   --dbsnp ${dbsnp} \
   --variant:vcf varanno_vcfs/286T.chr3.va.vcf \
    1> report/ValidateVariants/286T.chr3.va.valid.out \
    2> report/ValidateVariants/286T.chr3.va.valid.err &

An issue arises with a rsID that is flagged as not being present in dbSNP.

...fails strict validation: the rsID rs67850374 for the record at position chr3:123022685 is not in dbSNP

I realize this is an error message that generally would not generally qualify as an issue to post to these forums, however it is an error that seems to be generated by the Haplotype caller, illuminated by VariantAnnotator, and caught by the ValidateVariants.

The first 7 fields of the offending line in the 286T.chr3.va.vcf can be found using: cat 286T.chr3.va.vcf | grep rs67850374

chr3    123022685       rs67850374;rs72184829   AAAGAGAAGAGAAGAG        A       1865.98 .

There is a corresponding entry in the dbsnp_135.hg19.vcf file: cat $dbsnp | grep rs67850374

chr3    123022685       rs67850374;rs72184829   AA      A,AAAGAGAAGAG,AAAGAGAAGAGAAGAGAAGAG     .  PASS

My initial guess is that this is caused by a disagreement in the reference and variant fields between the two annotations. From what I can gather the call to the variantcontext function validateRSIDs() has a call to validateAlternateAlleles(). I assume this is what throws the error that is then caught and reported as "...fails strict validation..."

The UCSC genome browser for hg19 does show the specified position to be AA. It seems as thought the HaplotypeCaller simply used a different reference than dbsnp in this case.

The reference file supplied to HaplotypeCaller was the same as to VariantAnnotator and ValidateVariants. I did not supply the dbsnp argument to the HaplotypeCaller as I planned on doing all annotations after the initial variant calling, and the documentation states that the information is not utilized in the calculations. It seems as though this is a difference in between the reference assembly for dbSNP and the the reference supplied by the resource bundle.

My questions are:

  1. Is this really a problem that arises from slightly different reference assemblies?
  2. Is the hg19-1.5 reference fasta different from any other hg19 reference fasta?
  3. Is there at tool that I have missed that would have prevented this error and allowed the pipeline to continue without error?"
  4. Will this strict validation failure cause problems for the VariantRecalibrator?

As it stands, I am simply going to discard the offending lines manually. There are less than twenty in the entire exome sequencing of this particular tumor-normal sequencing. However, it seems like this issue will likely arise again. I will check the dbSNP VCF for places where the reference differs from the sequence in hg19. At least that should give me an estimate of the number of times this will arise and the locations to exclude from the variant calls.

-- Colin


Created 2012-12-10 23:46:19 | Updated 2012-12-11 02:31:16 | Tags: variantannotator resource

Comments (5)

I would like to have certain annotations applied only if the variants in the resource and the target have the same genomic change. Ex: In Cosmic the following variants are present.

12      25398284        .       C       G       .       PASS    Genename=KRAS;MutationAA=p.G12A;MutationCDS=c.35G>C
12      25398284        .       C       A       .       PASS    Genename=KRAS;MutationAA=p.G12V;MutationCDS=c.35G>T
12      25398284        .       C       T       .       PASS    Genename=KRAS;MutationAA=p.G12D;MutationCDS=c.35G>A

However, in my VCF file I get the G12A annotation for all variants at this site even if the actual change is G12V or G12D.

Is this possible?


Created 2012-11-19 02:07:53 | Updated | Tags: mappingqualityranksumtest readposranksumtest variantannotator

Comments (11)

Hi,

I'm attempting to use Variant Annotator to annotate some VCFs produced by samtools so I can run VQSR on them. Unfortunately I've gottent stuck and I'm trying to figure out why Variant Annotator wouldn't be annotating INDELs with MappingQualityRankSumTest and ReadPosRankSumTest, it seems to annotate SNPs fine. There are both Homs and het's called on the sample. Could it be I need to left align the indels to get enough coverage? What would you suggest is the best way to debug this? Is there a way to make GATK behave more verbosely about why it's refusing an annotation?

Thanks Martin


Created 2012-11-05 08:12:48 | Updated | Tags: unifiedgenotyper qualbydepth variantannotator error

Comments (14)

I had annotated raw indel file (given by UnifiedGenotyper), 1000G_omni2.5.b37.sites.vcf and hapmap_3.3.b37.sites.vcf with all possible annotations including QD (QualByDepth) using VariantAnnotator. However, i got an error when i tried to run VariantRecalibrator. It was complaing that QD has not been found in training variant. Is QD important annotation for indel filtering. Can it be ignored ?

P.S. - i did not use sample bam file while annotating training data set.

.
.
.
INFO  15:11:55,999 RMDTrackBuilder - Loading Tribble index from disk for file NCBI_dbsnp_for_GATK.vcf
INFO  15:12:21,650 TraversalEngine -  chr1:128346793        1.98e+07   30.0 s        1.5 s      4.1%        12.1 m    11.6 m
INFO  15:12:51,650 TraversalEngine -  chr9:130658800        5.26e+07   60.0 s        1.1 s     53.9%       111.2 s    51.2 s
INFO  15:13:13,618 VariantDataManager - QD:      mean = NaN      standard deviation = NaN
INFO  15:13:16,417 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.1-13-g1706365):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://www.broadinstitute.org/gsa/wiki/index.php/VariantAnnotator
##### ERROR ------------------------------------------------------------------------------------------

Created 2012-10-23 10:00:08 | Updated 2012-10-23 10:00:08 | Tags: haplotypecaller variantannotator

Comments (4)

Hi there, I'm running into an issue with my Queue pipeline, with variants called with HaplotypeCaller. Once I get the raw VCF file, I use VariantAnnotator before VQSR: however, no HaplotypeScore annotation is being produced, resulting in a failure of the VariantRecalibrator step where 'HaplotypeScore' was indicated as an annotation.

I tried to correct the issue by indicating to VariantAnnotator to use all annotations

  class AnnotationArguments (t: Target) extends VariantAnnotator with UNIVERSAL_GATK_ARGS {
this.reference_sequence = qscript.referenceFile
// Set the memory limit to 7 gigabytes on each command.
    this.memoryLimit = 7
    this.input_file :+= qscript.bamFile
    this.useAllAnnotations
    this.D = qscript.dbSNP_b37
  }

But I still can't get any output in the annotated VCF of that parameter. Here an example of a variant

 AC=5;AF=0.078;AN=64;ActiveRegionSize=179;ClippingRankSum=-0.568;DB;DP=2025;EVENTLENGTH=0;FS=4.139;InbreedingCoeff=-0.0847;MLEAC=5;MLEAF=0.078;MQ=69.98;MQRankSum=-1.428;NVH=1;NumHapAssembly=15;NumHapEval=13;QD=17.20;QDE=17.20;ReadPosRankSum=-1.762;TYPE=SNP;extType=SNP

Any suggestions on what I might be doing wrong?

thanks very much for your help, Francesco


Created 2012-10-12 10:48:39 | Updated 2012-10-18 00:57:09 | Tags: variantannotator variants annotation

Comments (2)

Hi,

Could you tell me when we can use new version of SnpEff with GATK?

I have some bugs :

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.0-35-g2d70733):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Could not create module ActiveRegionBasedAnnotation because BUG: Cannot find expected constructor for class caused by exception org.broadinstitute.sting.gatk.walkers

.annotator.interfaces.ActiveRegionBasedAnnotation.()

ERROR ------------------------------------------------------------------------------------------
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.0-35-g2d70733):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Could not create module ExperimentalAnnotation because BUG: Cannot find expected constructor for class

caused by exception org.broadinstitute.sting.gatk.walkers.annotator.interfaces.ExperimentalAnnotation.()

ERROR ------------------------------------------------------------------------------------------
ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 2.0-35-g2d70733):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Could not create module RankSumTest because BUG: cannot instantiate class: must be concrete class caused by exception null
ERROR ------------------------------------------------------------------------------------------

I don't know if I forget some other options linked these annotations. These options are important for me. So I deleted them but if somebody want to use them ...

Regards,

Tiphaine


Created 2012-09-19 15:45:44 | Updated 2013-01-07 20:39:33 | Tags: unifiedgenotyper variantannotator downsampling

Comments (4)

I am trying to understand how Variant Annotator functions. I took the vcf file from the output of UnifiedGenotyper and ran Variant Annotator with the same .bam and .bed files I used for running UnifiedGenotyper. I expected that all the calculations in the INFO field will remain the same, since I am using the same input files. However, I find that some fields have different values. Here is an example: UnifiedGenotyper output:

chr22   30094366        .       G       A       172.01  .       AC=1;AF=0.500;AN=2;BaseQRankSum=3.182;DP=244;DS;Dels=0.00;FS=0.000;HaplotypeScore=118.5897;MLEAC=1;MLEAF=0.500;MQ=43.29;MQ0=0;MQRankSum=-0.049;QD=0.70;ReadPosRankSum=1.428;SB=-6.201e+01        GT:AD:DP:GQ:PL  0/1:220,19:244:99:202,0,2693 

VariantAnnotator output:

chr22   30094366        .       G       A       172.01  .       ABHet=0.923;AC=1;AF=0.500;AN=2;BaseQRankSum=3.182;DP=993;DS;Dels=0.00;FS=0.000;HaplotypeScore=454.8386;MLEAC=1;MLEAF=0.500;MQ=43.29;MQ0=0;MQ0Fraction=0.0000;MQRankSum=-0.378;OND=0.034;QD=0.17;ReadPosRankSum=-4.859;SB=-6.201e+01      GT:AD:DP:GQ:PL  0/1:220,19:244:99:202,0,2693

I am running GATKLite 2.1. Notice the DP in the info field has a different value. HaplotypeScore, QD, MQRankSum, etc have different values as well. Am I doing anything wrong? Shouldn't these values be the same when I recalculate these fields using VariantAnnotator?