Tagged with #tranches
0 documentation articles | 0 announcements | 6 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2015-12-02 21:32:30 | Updated | Tags: variantrecalibrator vqsr tranches
Comments (8)

I ran GATK's cohort genotyping pipeline on 5000 human samples with Illumina WGS ~1.3x data, up through GenotypeGVCFs (and CatVariants to combine chunks) using v3.4-46. Next I ran VariantRecalibrator (initially just chr1) using recommended settings with both v3.4-46 and v3.5. Here is my command for both versions:

java -Xmx40g \ -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R hs37m.fa \ -input gatk.hc.combined.genotyped.chr1-22.vcf.gz \ -recalFile snps.recal \ -tranchesFile snps.tranches \ -rscriptFile recalibrate_SNP_plots.R \ --target_titv 2.15 \ -nt 24 \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf.gz \ -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.vcf.gz \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.b37.vcf.gz \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.b37.vcf.gz \ -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff \ -mode SNP \ -L 1 \ -tranche 100.0 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 98.5 -tranche 90.0 \ --maxGaussians 6 \ -log VariantRecalibrator.snps.log

The attached tranches plots (snps.tranches.v3.5.pdf) generated w/ v3.5 look strange because:

1) The tranches are out of order on the bar plot (e.g., 99.5 is before 99) 2) The fill coloring doesn't make sense for tranches 99 and 98.5 - there are orange stripes over the blue bar 3) The scatter plot's connecting lines go in both directions

The plots for v3.4-46 look more normal (snps.tranches.v3.4-46.pdf), though I'm still trying to figure out how to get closer to the expected 2.15 Ti/Tv ratio. Oddly, the Ti/Tv ratios differ slightly between v3.4-46 and v3.5 even though the same data and settings were used.

I suspected the behavior w/ v3.5 may be a possible bug in VariantRecalibrator, which is why I'm posting here. Please let me know if you need any more information.

My best,

Chris


Created 2015-02-23 13:55:20 | Updated | Tags: variantrecalibrator tranches
Comments (4)

Hi Geraldine, I think that this question has been asked before but I cannot find the way to fix the problem. I have just run VariantRecalibrator tool, and I'm getting this (see attached file) tranches plot for SNPs. I have understood correctly the tranches plot of the best practices manual but, I don't know what is going on here. I know that the tranches are sorted by Ti/Tv but for my, this plot makes no sense.

This is my command: java -Xmx4g -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator -R human_g1k_v37.fasta \ -input all.joinGeno.raw.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.vcf \ -resource:omni,known=false,training=true,truth=true,prior=12.0 1000G_omni2.5.b37.vcf \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.b37.vcf \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.b37.vcf \ -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -mode SNP \ -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \ -recalFile recalibrate_SNP.recal \ -tranchesFile recalibrate_SNP.tranches \ -rscriptFile recalibrate_SNP_plots.R

Any help would be appreciated.


Created 2015-02-06 16:34:08 | Updated 2015-02-06 16:37:35 | Tags: tranches
Comments (2)

The truth labels on my VQSR tranches plot seem to be misplaced. See attached image. I can decipher it, but thought I would let you know.

And while I'm at it. The other pdf generated by VR showing the 2D plots for all annotation combinations is not generated, if the VR output is specified to be in a sub folder. It can be generated in the sub folder, if the R script is moved to and executed in the main folder. I use gnuplot and matplotlib for plotting, so I don't know why this is. I noticed someone else having a similar issue on the forum the other day.


Created 2014-03-24 23:49:59 | Updated 2014-03-24 23:52:11 | Tags: variantrecalibrator applyrecalibration tranches variant-recalibration
Comments (20)

Hi, I am planning to filter with ApplyRecal very stringently (I don't mind losing a lot of SNPs to reduce false positives), but I am having trouble viewing the tranches plot produced by VariantRecal to choose my cutoff level. When I run the following command I get the default visualisation of 90, 99, 99.9 & 100 with cumulative TPs & FPs, and everything looks fine:

java -Xmx8g -jar $gatk \ -T VariantRecalibrator \ -R $ref \ -input $vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $hapmap \ -resource:omni,known=false,training=true,truth=false,prior=12.0 $onek \ -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 $dbsnp \ --minNumBadVariants 1000 \ -an QD \ -an MQRankSum \ -an ReadPosRankSum \ -an MQ \ -an FS \ -an DP \ -mode SNP \ -recalFile ${haplocall_date}_all_sites.snp.recal \ -tranchesFile ${haplocall_date}_all_sites.snp.tranches \ --rscript_file ${haplocall_date}_all_sites.snp.rscript \ 1>snp_recal_all_sites.out 2>snp_recal_all_sites.err &

But when I add in the following flags after '-mode SNP' to change the tranches, the plot displays them out of order (and the Ti/Tv ratio doesn't seem to correlate as I expected):

-tranche 80.0 -tranche 85.0 -tranche 90.0 -tranche 99.0 \

This means the cumulative FP/TP display is no longer appropriate and is making it hard to tell what the actual proportions are at each level. When I specify fewer tranches, it displays them in the same descending order no matter how many and in what order I specify them:

The total number of variants that get through the filter also seems to change depending on what tranches are specified (eg the 85 cutoff has ~50K in the second graph and ~30 in the third graph); I would have thought it would be the same for the same dataset, but I might be misunderstanding something.

1-Why does the number of variants change when the same tranch is called in a different set of tranches? (And the Ti/Tv ratio seems to decrease regardless of whether the tranch is above or below 90)?

2- Why does the default tranch plot show ascending order of the tranches, but when I specify my own cutoffs it never does?

3- Alternatively, is there a way to remove the display of the cumulative TP/FPs to make the plot easier to read?

4- Or perhaps a simpler solution, can I bypass the plot altogether? Does one of the outputs from VariantRecal have the data (or a way of calculating it) about TP and FP predicted variants that is used to produce these plots so I can just compare the numbers?

Thanks very much, and let me know if I need to clarify anything


Created 2012-10-23 02:15:29 | Updated 2013-01-07 20:11:44 | Tags: unifiedgenotyper vqsr tranches multi-sample
Comments (3)

Hello,

I am trying to run GATK on a sample of 119 exomes. I followed the GATK guidelines to process the fastq files. I used the following parameters to call the UnifiedGenotyper and VQSR [for SNPs]:

UnifiedGenotyper

-T UnifiedGenotyper 
--output_mode EMIT_VARIANTS_ONLY 
--min_base_quality_score 30 
--max_alternate_alleles 5 
-glm SNP 

VQSR

-resource:hapmap,known=false,training=true,truth=true,prior=15.0 /media/transcription/cipn/5.pt/ref/hapmap_3.3.hg19.sites.vcf 
-resource:omni,known=false,training=true,truth=false,prior=12.0 /media/transcription/cipn/5.pt/ref/1000G_omni2.5.hg19.sites.vcf 
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /media/transcription/cipn/5.pt/ref/dbsnp_135.hg19.vcf.gz 
-an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff 
-mode SNP 

I get a tranche plot, which does not look OK. The "Number of Novel Variants [1000s]" goes from -400 to 800 and the Ti/Tv ratio varies from 0.633 to 0.782 [the attach file link is not working for me and am unable to upload the plot]. Any suggestion to rectify this would be very helpful !

cheers, Rahul


Created 2012-10-11 18:12:51 | Updated 2013-01-07 19:13:46 | Tags: variantrecalibrator vqsr tranches
Comments (5)

Hello,

I am running Variant Quality Score Recalibration on indels with the following command.

java -Xmx8g -jar /raid/software/src/GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
    -T VariantRecalibrator \
    -R /raid/references-and-indexes/hg19/bwa/hg19_lite.fa \
    -input indel_output_all_chroms_combined.vcf \
    --maxGaussians 4 -std 10.0 -percentBad 0.12 \
    -resource:mills,known=true,training=true,truth=true,prior=12.0  /raid/Merlot/exome_pipeline_v1/ref/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
    -an QD -an FS -an HaplotypeScore -an ReadPosRankSum  \
    --ts_filter_level 95.0 \
     -mode INDEL \
    -recalFile /raid2/projects/STFD/indel_output_7.recal \
    -tranchesFile /raid2/projects/STFD/indel_output_7.tranches \
    -rscriptFile /raid2/projects/STFD/indel_output_7.plots.R

My tranches file reports only false positives for all tranches. When I run VQSR on SNPS, the tranches have many true positives and look similar to other tranch files reported on this site. I am wondering if anyone has similar experiences or suggestions?

Thanks