Tagged with #vqsr indel
0 documentation articles | 0 announcements | 3 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (13)

Hi there,

I am running VQSR (GenomeAnalysisTK-2.8-1-g932cd3a) on snps and indels of an exome dataset. The SNP case works fine but the indel case gives the following error which states it might be due to a big in the program. I'd appreciate any comment as how to resolve this issue. Thank you

Amin

PS:

INFO 19:32:31,908 HelpFormatter - -------------------------------------------------------------------------------- INFO 19:32:31,911 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.8-1-g932cd3a, Compiled 2013/12/06 16:47:15 INFO 19:32:31,911 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 19:32:31,911 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 19:32:31,916 HelpFormatter - Program Args: -T VariantRecalibrator -R ucsc.hg19.fasta -input SAMPLE.indel.vcf -resource:mills,known=true,training=true,truth=true,prior=12.0 Mills_and_1000G_gold_standard.indels.hg19.vcf -an DP -an FS -mode INDEL -an ReadPosRankSum -an MQRankSum --maxGaussians 4 -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile SAMPLE.tmp.indel.vcf -tranchesFile SAMPLE.tranches.gatk.indel.recal.csv -rscriptFile SAMPLE.gatk.recal.indel.R INFO 19:32:31,916 HelpFormatter - Date/Time: 2014/01/13 19:32:31 INFO 19:32:31,916 HelpFormatter - -------------------------------------------------------------------------------- INFO 19:32:31,916 HelpFormatter - -------------------------------------------------------------------------------- INFO 19:32:31,936 ArgumentTypeDescriptor - Dynamically determined type of SAMPLE.indel.vcf to be VCF INFO 19:32:31,967 ArgumentTypeDescriptor - Dynamically determined type of Mills_and_1000G_gold_standard.indels.hg19.vcf to be VCF INFO 19:32:32,816 GenomeAnalysisEngine - Strictness is SILENT INFO 19:32:32,963 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 19:32:33,000 RMDTrackBuilder - Loading Tribble index from disk for file SAMPLE.indel.vcf INFO 19:32:33,059 RMDTrackBuilder - Loading Tribble index from disk for file Mills_and_1000G_gold_standard.indels.hg19.vcf INFO 19:32:33,244 GenomeAnalysisEngine - Preparing for traversal INFO 19:32:33,266 GenomeAnalysisEngine - Done preparing for traversal INFO 19:32:33,268 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 19:32:33,268 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining WARN 19:32:33,275 Utils - ******************************************************************************** WARN 19:32:33,276 Utils - * WARNING: WARN 19:32:33,276 Utils - * WARN 19:32:33,276 Utils - * Rscript not found in environment path. WARN 19:32:33,276 Utils - * SAMPLE.gatk.recal.indel.R will be generated but PDF plots WARN 19:32:33,277 Utils - * will not. WARN 19:32:33,277 Utils - ******************************************************************************** INFO 19:32:33,281 TrainingSet - Found mills track: Known = true Training = true Truth = true Prior = Q12.0 INFO 19:32:53,178 VariantDataManager - DP: mean = 15.61 standard deviation = 17.91 INFO 19:32:53,185 VariantDataManager - FS: mean = 0.50 standard deviation = 1.74 INFO 19:32:53,189 VariantDataManager - ReadPosRankSum: mean = 0.01 standard deviation = 0.97 INFO 19:32:53,200 VariantDataManager - MQRankSum: mean = 0.04 standard deviation = 0.98 INFO 19:32:53,277 VariantDataManager - Annotations are now ordered by their information content: [DP, FS, MQRankSum, ReadPosRankSum] INFO 19:32:53,279 VariantDataManager - Training with 10207 variants after standard deviation thresholding. INFO 19:32:53,283 GaussianMixtureModel - Initializing model with 100 k-means iterations... INFO 19:32:53,578 VariantRecalibratorEngine - Finished iteration 0. INFO 19:32:53,739 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.20601 INFO 19:32:53,821 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.07243 INFO 19:32:53,903 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.11180 INFO 19:32:53,986 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.05371 INFO 19:32:54,068 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.00977 INFO 19:32:54,133 VariantRecalibratorEngine - Convergence after 29 iterations! INFO 19:32:54,169 VariantRecalibratorEngine - Evaluating full set of 15801 variants... INFO 19:32:54,726 VariantDataManager - Training with worst 184 scoring variants --> variants with LOD <= -5.0000. INFO 19:32:54,727 GaussianMixtureModel - Initializing model with 100 k-means iterations... INFO 19:32:54,728 VariantRecalibratorEngine - Finished iteration 0. INFO 19:32:54,729 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.02527 INFO 19:32:54,730 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.02685 INFO 19:32:54,731 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.04499 INFO 19:32:54,732 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.10418 INFO 19:32:54,733 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.31466 INFO 19:32:54,734 VariantRecalibratorEngine - Convergence after 29 iterations! INFO 19:32:54,734 VariantRecalibratorEngine - Evaluating full set of 15801 variants... INFO 19:32:55,845 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.IllegalArgumentException: log10p: Values must be non-infinite and non-NAN at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:237) at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:225) at org.broadinstitute.sting.utils.MathUtils.log10sumLog10(MathUtils.java:250) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.GaussianMixtureModel.nanTolerantLog10SumLog10(GaussianMixtureModel.java:239) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.GaussianMixtureModel.evaluateDatumMarginalized(GaussianMixtureModel.java:286) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.GaussianMixtureModel.evaluateDatum(GaussianMixtureModel.java:244) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.evaluateDatum(VariantRecalibratorEngine.java:167) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibratorEngine.evaluateData(VariantRecalibratorEngine.java:100) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:360) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:139) at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: log10p: Values must be non-infinite and non-NAN
ERROR ------------------------------------------------------------------------------------------
Comments (6)

Hello again,

please, we need help understanding a deletion detected by HC.

Here is the VCF line and the activeRegions output (attached):

MT 9190 . CTGCACGACAACACAT C 1057.73 VQSRTrancheBOTH99.00to99.90 AC=1;AF=0.500;AN=2;BaseQRankSum=1.935;ClippingRankSum=-0.516;DP=301;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=45.84;MQ0=0;MQRankSum=1.456;NEGATIVE_TRAIN_SITE;QD=0.23;ReadPosRankSum=0.625;VQSLOD=-1.682e+00;culprit=MQ;set=variant23 GT:AD:GQ:PL 0/1:51,11:99:1095,0,1411

As we can see in the image several reads confirm a 2bp deletion at 9204. But HC called a 16bp deletion (that was later on filtered out of our pipeline because of VQSR not passing our filter) that is supported only by one read. Those "Insertions" marked at the end of the reads are "N"s from masked adaptors.

I'm glad that HC tried to identify a bigger deletion, but shouldn't it call the 2bp deletion as well?

Thanks in advance, Rodrigo.

Comments (3)

I have used vqsr in combination with snp array data. Based on when the the ti/tv ratio dropped below the expected / stable ti/tv ratio I selected a truth sensitivity level and applied the filter model for that level. This all worked really nice for SNPs and I now want to do the same for the indels.

We don't have any external truth set for indels so I was planning to use the the highest quality indels in my call set as the truth set.

Then I can run the vqsr to generate different filtering models for different truth sensitivity levels.

But how do I know at which truth sensitivity level my gain in true positives is offset by a (too large) increase in false positives?

The Ti/Tv ratio is used for this with SNP but as far as I know this doesn't apply to indels.