question about odd result for VQSR
Posted in Ask the GATK team | Last updated on


Comments (20)

Hi,

Recently I run into some odd observation in VQSR. I have 17 samples from a same family and I used all of 17 samples to call SNPs and after VQSR, I got the trench file like this:

Variant quality score tranches file

Version number 5

targetTruthSensitivity,numKnown,numNovel,knownTiTv,novelTiTv,minVQSLod,filterName,model,accessibleTruthSites,callsAtTruthSites,truthSensitivity 90.00,48637,716,2.9527,2.3302,4.8390,VQSRTrancheSNP0.00to90.00,SNP,26182,23563,0.9000 99.00,60114,1531,2.8057,2.3333,1.7766,VQSRTrancheSNP90.00to99.00,SNP,26182,25920,0.9900 99.90,67220,2884,2.7190,1.8222,-10.0009,VQSRTrancheSNP99.00to99.90,SNP,26182,26155,0.9990 100.00,69714,4998,2.6822,1.8300,-1122.0698,VQSRTrancheSNP99.90to100.00,SNP,26182,26182,1.0000

which seems fine. then for research purpose, I only used 5 samples of more tight relation such as two parents and their 3 immediate children and after VQSR, the trench file looks like below:

Variant quality score tranches file

Version number 5

targetTruthSensitivity,numKnown,numNovel,knownTiTv,novelTiTv,minVQSLod,filterName,model,accessibleTruthSites,callsAtTruthSites,truthSensitivity 90.00,50598,2279,2.6625,1.7993,-Infinity,VQSRTrancheSNP0.00to90.00,SNP,20850,20850,1.0000 99.00,50598,2279,2.6625,1.7993,-Infinity,VQSRTrancheSNP90.00to99.00,SNP,20850,20850,1.0000 99.90,50598,2279,2.6625,1.7993,-Infinity,VQSRTrancheSNP99.00to99.90,SNP,20850,20850,1.0000 100.00,50598,2279,2.6625,1.7993,-Infinity,VQSRTrancheSNP99.90to100.00,SNP,20850,20850,1.0000

Notice that the 5-sample VQSR tranch file has exactly the same thing throughout all thresholds: 90, 99, 99.90 and 100. and the VQSR modeling plot is also very odd, no plotting at all being seen (the pdf ifle was created but was almost blank in contrast to the normal projection plots I saw in other cases)

However, we did use the old version to call the same 5 samples before, and the trench file looks like below:

Variant quality score tranches file

Version number 4

targetTruthSensitivity,numKnown,numNovel,knownTiTv,novelTiTv,minVQSLod,filterName,accessibleTruthSites,callsAtTruthSites,truthSensitivity 90.00,36407,361,2.8657,2.3119,5.0854,TruthSensitivityTranche0.00to90.00,20814,18732,0.9000 99.00,44097,638,2.7655,2.2222,2.2592,TruthSensitivityTranche90.00to99.00,20814,20605,0.9900 99.90,47947,1061,2.7078,1.8750,-7.4143,TruthSensitivityTranche99.00to99.90,20814,20793,0.9990 100.00,50426,2318,2.6645,1.7677,-647.3944,TruthSensitivityTranche99.90to100.00,20814,20814,1.0000

this time, it looks reasonable to me. This is troubling us since for 5 samples, the old version (V1.6-7) seems working fine, whereas the new version (V2.1-13) seems having issue or can not get further filtering by VQSR (90, 99 and 100 got the same result, I did repeat multiple times and got the same results), although for all of the 17 samples, the new version seems fine on VQSR.

So my questions are: 1. is it possible that in some occasion, VQSR can simply not work? 2. Why the old version seems working but not the new version for exactly the same set of 5-sample data?

Thanks a lot for your help!

Mike


Return to top Comment on this article in the forum