Tranches plot with tranches less than 90
Posted in Ask the GATK team | Last updated on 2014-03-24 23:52:11


Comments (4)

Hi, I am planning to filter with ApplyRecal very stringently (I don't mind losing a lot of SNPs to reduce false positives), but I am having trouble viewing the tranches plot produced by VariantRecal to choose my cutoff level. When I run the following command I get the default visualisation of 90, 99, 99.9 & 100 with cumulative TPs & FPs, and everything looks fine:

java -Xmx8g -jar $gatk \ -T VariantRecalibrator \ -R $ref \ -input $vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $hapmap \ -resource:omni,known=false,training=true,truth=false,prior=12.0 $onek \ -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 $dbsnp \ --minNumBadVariants 1000 \ -an QD \ -an MQRankSum \ -an ReadPosRankSum \ -an MQ \ -an FS \ -an DP \ -mode SNP \ -recalFile ${haplocall_date}_all_sites.snp.recal \ -tranchesFile ${haplocall_date}_all_sites.snp.tranches \ --rscript_file ${haplocall_date}_all_sites.snp.rscript \ 1>snp_recal_all_sites.out 2>snp_recal_all_sites.err &

But when I add in the following flags after '-mode SNP' to change the tranches, the plot displays them out of order (and the Ti/Tv ratio doesn't seem to correlate as I expected):

-tranche 80.0 -tranche 85.0 -tranche 90.0 -tranche 99.0 \

This means the cumulative FP/TP display is no longer appropriate and is making it hard to tell what the actual proportions are at each level. When I specify fewer tranches, it displays them in the same descending order no matter how many and in what order I specify them:

The total number of variants that get through the filter also seems to change depending on what tranches are specified (eg the 85 cutoff has ~50K in the second graph and ~30 in the third graph); I would have thought it would be the same for the same dataset, but I might be misunderstanding something.

1-Why does the number of variants change when the same tranch is called in a different set of tranches? (And the Ti/Tv ratio seems to decrease regardless of whether the tranch is above or below 90)?

2- Why does the default tranch plot show ascending order of the tranches, but when I specify my own cutoffs it never does?

3- Alternatively, is there a way to remove the display of the cumulative TP/FPs to make the plot easier to read?

4- Or perhaps a simpler solution, can I bypass the plot altogether? Does one of the outputs from VariantRecal have the data (or a way of calculating it) about TP and FP predicted variants that is used to produce these plots so I can just compare the numbers?

Thanks very much, and let me know if I need to clarify anything


Return to top Comment on this article in the forum