Hi Geraldine, I think that this question has been asked before but I cannot find the way to fix the problem. I have just run VariantRecalibrator tool, and I'm getting this (see attached file) tranches plot for SNPs. I have understood correctly the tranches plot of the best practices manual but, I don't know what is going on here. I know that the tranches are sorted by Ti/Tv but for my, this plot makes no sense.

This is my command: java -Xmx4g -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator -R human_g1k_v37.fasta \ -input all.joinGeno.raw.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf \ -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.vcf \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.b37.vcf \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.b37.vcf \ -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -mode SNP \ -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \ -recalFile recalibrate_SNP.recal \ -tranchesFile recalibrate_SNP.tranches \ -rscriptFile recalibrate_SNP_plots.R

Any help would be appreciated.

The truth labels on my VQSR tranches plot seem to be misplaced. See attached image. I can decipher it, but thought I would let you know.

And while I'm at it. The other pdf generated by VR showing the 2D plots for all annotation combinations is not generated, if the VR output is specified to be in a sub folder. It can be generated in the sub folder, if the R script is moved to and executed in the main folder. I use gnuplot and matplotlib for plotting, so I don't know why this is. I noticed someone else having a similar issue on the forum the other day.

Hi, I am planning to filter with ApplyRecal very stringently (I don't mind losing a lot of SNPs to reduce false positives), but I am having trouble viewing the tranches plot produced by VariantRecal to choose my cutoff level. When I run the following command I get the default visualisation of 90, 99, 99.9 & 100 with cumulative TPs & FPs, and everything looks fine:

java -Xmx8g -jar $gatk \ -T VariantRecalibrator \ -R$ref \ -input $vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0$hapmap \ -resource:omni,known=false,training=true,truth=false,prior=12.0 $onek \ -resource:dbsnp,known=true,training=false,truth=false,prior=8.0$dbsnp \ --minNumBadVariants 1000 \ -an QD \ -an MQRankSum \ -an ReadPosRankSum \ -an MQ \ -an FS \ -an DP \ -mode SNP \ -recalFile ${haplocall_date}_all_sites.snp.recal \ -tranchesFile${haplocall_date}_all_sites.snp.tranches \ --rscript_file \${haplocall_date}_all_sites.snp.rscript \ 1>snp_recal_all_sites.out 2>snp_recal_all_sites.err &

But when I add in the following flags after '-mode SNP' to change the tranches, the plot displays them out of order (and the Ti/Tv ratio doesn't seem to correlate as I expected):

-tranche 80.0 -tranche 85.0 -tranche 90.0 -tranche 99.0 \

This means the cumulative FP/TP display is no longer appropriate and is making it hard to tell what the actual proportions are at each level. When I specify fewer tranches, it displays them in the same descending order no matter how many and in what order I specify them:

The total number of variants that get through the filter also seems to change depending on what tranches are specified (eg the 85 cutoff has ~50K in the second graph and ~30 in the third graph); I would have thought it would be the same for the same dataset, but I might be misunderstanding something.

1-Why does the number of variants change when the same tranch is called in a different set of tranches? (And the Ti/Tv ratio seems to decrease regardless of whether the tranch is above or below 90)?

2- Why does the default tranch plot show ascending order of the tranches, but when I specify my own cutoffs it never does?

3- Alternatively, is there a way to remove the display of the cumulative TP/FPs to make the plot easier to read?

4- Or perhaps a simpler solution, can I bypass the plot altogether? Does one of the outputs from VariantRecal have the data (or a way of calculating it) about TP and FP predicted variants that is used to produce these plots so I can just compare the numbers?

Thanks very much, and let me know if I need to clarify anything