VariantRecalibrator, creating a truth data set
Posted in Ask the GATK team | Last updated on 2013-01-07 19:49:58


Comments (5)

Hello community, I am working with yeast and I am doing the VariantRecalibrator step, as I dont have a truth data set I want to "filter" my initial round of raw SNP in order to have the highest quality score SNP as the gatk team suggest.

1) I was wondering if you have any suggestion about the parameters of filtration...

I am working with each strain as different species (WGS), so I have good coverage (80X) but only one "Lane" I tried with:

java -Xmx4g -jar GenomeAnalysisTK.jar -R S288c.fasta -T VariantFiltration --variant $1.raw.vcf --filterExpression "QD<2.0 || MQ<45.0 || FS>60 || MQEankSum< -12.5 || ReadPosRankSum<-8.0 " --filterName "hardtovalidate" -o $1.filt.vcf

to remove after the LowQual and hardtovalidate snps, that make sense? thanks for your help!

2) Then after, I would do the VariantRecalibrator, but I will have only one truth set, can I use -mode both, or I should try to obtain a truth data set of indels and do the VQSR for SNP and Indels separately? What do you think?

java -Xmx4g -jar  GenomeAnalysisTK.jar -T VariantRecalibrator  -R ncbi_S288c.fasta -input $1.raw.vcf -recalFile $1.raw.recal -tranchesFile $1.raw.tranches -resource:filtered,known=false,training=true,truth=true,prior=15.0 $1.truth.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an DP **-mode both** 

Thanks! Luisa


Return to top Comment on this article in the forum