Tagged with #tranches
0 documentation articles | 0 announcements | 8 forum discussions


No articles to display.

No articles to display.


Created 2016-03-28 18:24:17 | Updated | Tags: vqsr tranches

Comments (6)

I'm doing a large variant calling project on a cohort of ~10,000 exomes. I've run into an issue with VQSR. Everything appears to be working normally except for my output tranche plot (attached), where I'm seeing no false positives. I know this is too good to be true.

From reading other posts on here, I used dbsnp_138.b37.excluding_sites_after_129.vcf , but this didn't change my plots. Some other details : my exomes were generated with different kits. While running VQSR I have tried using -L with the superset of all capture regions ( probably not the best idea) and the intersection of all capture regions (what I intend to use), but the tranche plots look the same regardless.

Using GATK v3.5. Any suggestions would be greatly appreciated.

Alex


Created 2016-02-23 19:52:09 | Updated 2016-02-23 19:57:53 | Tags: tranches r

Comments (3)

I've run through the VQSR pipeline.
I've written out the [R] code: "-rscriptFile output.plots.snp.R" I run the [R] code and get the gaussian mixture model plots, but NO tranche plot?

The data to make the tranches plot is in the same directory and looks OK.

After searching the [R] script I see no text related to "TITV"....?

Version information

INFO 08:21:30,408 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56


Created 2015-12-02 21:32:30 | Updated | Tags: variantrecalibrator vqsr tranches

Comments (8)

I ran GATK's cohort genotyping pipeline on 5000 human samples with Illumina WGS ~1.3x data, up through GenotypeGVCFs (and CatVariants to combine chunks) using v3.4-46. Next I ran VariantRecalibrator (initially just chr1) using recommended settings with both v3.4-46 and v3.5. Here is my command for both versions:

java -Xmx40g \ -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R hs37m.fa \ -input gatk.hc.combined.genotyped.chr1-22.vcf.gz \ -recalFile snps.recal \ -tranchesFile snps.tranches \ -rscriptFile recalibrate_SNP_plots.R \ --target_titv 2.15 \ -nt 24 \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf.gz \ -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.vcf.gz \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.b37.vcf.gz \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.b37.vcf.gz \ -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff \ -mode SNP \ -L 1 \ -tranche 100.0 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 98.5 -tranche 90.0 \ --maxGaussians 6 \ -log VariantRecalibrator.snps.log

The attached tranches plots (snps.tranches.v3.5.pdf) generated w/ v3.5 look strange because:

1) The tranches are out of order on the bar plot (e.g., 99.5 is before 99) 2) The fill coloring doesn't make sense for tranches 99 and 98.5 - there are orange stripes over the blue bar 3) The scatter plot's connecting lines go in both directions

The plots for v3.4-46 look more normal (snps.tranches.v3.4-46.pdf), though I'm still trying to figure out how to get closer to the expected 2.15 Ti/Tv ratio. Oddly, the Ti/Tv ratios differ slightly between v3.4-46 and v3.5 even though the same data and settings were used.

I suspected the behavior w/ v3.5 may be a possible bug in VariantRecalibrator, which is why I'm posting here. Please let me know if you need any more information.

My best,

Chris


Created 2015-02-23 13:55:20 | Updated | Tags: variantrecalibrator tranches

Comments (4)

Hi Geraldine, I think that this question has been asked before but I cannot find the way to fix the problem. I have just run VariantRecalibrator tool, and I'm getting this (see attached file) tranches plot for SNPs. I have understood correctly the tranches plot of the best practices manual but, I don't know what is going on here. I know that the tranches are sorted by Ti/Tv but for my, this plot makes no sense.

This is my command: java -Xmx4g -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator -R human_g1k_v37.fasta \ -input all.joinGeno.raw.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf \ -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.vcf \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.b37.vcf \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.b37.vcf \ -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -mode SNP \ -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \ -recalFile recalibrate_SNP.recal \ -tranchesFile recalibrate_SNP.tranches \ -rscriptFile recalibrate_SNP_plots.R

Any help would be appreciated.


Created 2015-02-06 16:34:08 | Updated 2015-02-06 16:37:35 | Tags: tranches

Comments (2)

The truth labels on my VQSR tranches plot seem to be misplaced. See attached image. I can decipher it, but thought I would let you know.

And while I'm at it. The other pdf generated by VR showing the 2D plots for all annotation combinations is not generated, if the VR output is specified to be in a sub folder. It can be generated in the sub folder, if the R script is moved to and executed in the main folder. I use gnuplot and matplotlib for plotting, so I don't know why this is. I noticed someone else having a similar issue on the forum the other day.


Created 2014-03-24 23:49:59 | Updated 2014-03-24 23:52:11 | Tags: variantrecalibrator applyrecalibration tranches variant-recalibration

Comments (20)

Hi, I am planning to filter with ApplyRecal very stringently (I don't mind losing a lot of SNPs to reduce false positives), but I am having trouble viewing the tranches plot produced by VariantRecal to choose my cutoff level. When I run the following command I get the default visualisation of 90, 99, 99.9 & 100 with cumulative TPs & FPs, and everything looks fine:

java -Xmx8g -jar $gatk \ -T VariantRecalibrator \ -R $ref \ -input $vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $hapmap \ -resource:omni,known=false,training=true,truth=false,prior=12.0 $onek \ -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 $dbsnp \ --minNumBadVariants 1000 \ -an QD \ -an MQRankSum \ -an ReadPosRankSum \ -an MQ \ -an FS \ -an DP \ -mode SNP \ -recalFile ${haplocall_date}_all_sites.snp.recal \ -tranchesFile ${haplocall_date}_all_sites.snp.tranches \ --rscript_file ${haplocall_date}_all_sites.snp.rscript \ 1>snp_recal_all_sites.out 2>snp_recal_all_sites.err &

But when I add in the following flags after '-mode SNP' to change the tranches, the plot displays them out of order (and the Ti/Tv ratio doesn't seem to correlate as I expected):

-tranche 80.0 -tranche 85.0 -tranche 90.0 -tranche 99.0 \

This means the cumulative FP/TP display is no longer appropriate and is making it hard to tell what the actual proportions are at each level. When I specify fewer tranches, it displays them in the same descending order no matter how many and in what order I specify them:

The total number of variants that get through the filter also seems to change depending on what tranches are specified (eg the 85 cutoff has ~50K in the second graph and ~30 in the third graph); I would have thought it would be the same for the same dataset, but I might be misunderstanding something.

1-Why does the number of variants change when the same tranch is called in a different set of tranches? (And the Ti/Tv ratio seems to decrease regardless of whether the tranch is above or below 90)?

2- Why does the default tranch plot show ascending order of the tranches, but when I specify my own cutoffs it never does?

3- Alternatively, is there a way to remove the display of the cumulative TP/FPs to make the plot easier to read?

4- Or perhaps a simpler solution, can I bypass the plot altogether? Does one of the outputs from VariantRecal have the data (or a way of calculating it) about TP and FP predicted variants that is used to produce these plots so I can just compare the numbers?

Thanks very much, and let me know if I need to clarify anything


Created 2012-10-23 02:15:29 | Updated 2013-01-07 20:11:44 | Tags: unifiedgenotyper vqsr tranches multi-sample

Comments (3)

Hello,

I am trying to run GATK on a sample of 119 exomes. I followed the GATK guidelines to process the fastq files. I used the following parameters to call the UnifiedGenotyper and VQSR [for SNPs]:

UnifiedGenotyper

-T UnifiedGenotyper 
--output_mode EMIT_VARIANTS_ONLY 
--min_base_quality_score 30 
--max_alternate_alleles 5 
-glm SNP 

VQSR

-resource:hapmap,known=false,training=true,truth=true,prior=15.0 /media/transcription/cipn/5.pt/ref/hapmap_3.3.hg19.sites.vcf 
-resource:omni,known=false,training=true,truth=false,prior=12.0 /media/transcription/cipn/5.pt/ref/1000G_omni2.5.hg19.sites.vcf 
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /media/transcription/cipn/5.pt/ref/dbsnp_135.hg19.vcf.gz 
-an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff 
-mode SNP 

I get a tranche plot, which does not look OK. The "Number of Novel Variants [1000s]" goes from -400 to 800 and the Ti/Tv ratio varies from 0.633 to 0.782 [the attach file link is not working for me and am unable to upload the plot]. Any suggestion to rectify this would be very helpful !

cheers, Rahul


Created 2012-10-11 18:12:51 | Updated 2013-01-07 19:13:46 | Tags: variantrecalibrator vqsr tranches

Comments (5)

Hello,

I am running Variant Quality Score Recalibration on indels with the following command.

java -Xmx8g -jar /raid/software/src/GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
    -T VariantRecalibrator \
    -R /raid/references-and-indexes/hg19/bwa/hg19_lite.fa \
    -input indel_output_all_chroms_combined.vcf \
    --maxGaussians 4 -std 10.0 -percentBad 0.12 \
    -resource:mills,known=true,training=true,truth=true,prior=12.0  /raid/Merlot/exome_pipeline_v1/ref/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
    -an QD -an FS -an HaplotypeScore -an ReadPosRankSum  \
    --ts_filter_level 95.0 \
     -mode INDEL \
    -recalFile /raid2/projects/STFD/indel_output_7.recal \
    -tranchesFile /raid2/projects/STFD/indel_output_7.tranches \
    -rscriptFile /raid2/projects/STFD/indel_output_7.plots.R

My tranches file reports only false positives for all tranches. When I run VQSR on SNPS, the tranches have many true positives and look similar to other tranch files reported on this site. I am wondering if anyone has similar experiences or suggestions?

Thanks