version 2.7 - CombineVariants seems to jave a bug in reporting AD and PL fields?
Posted in Ask the GATK team | Last updated on


Comments (11)

hello,

I am using CombineVariants to combine two multisample vcfs from made from HaplotypeCaller. The vcfs cover the same genomic region, and do not overlap in sample names. I have noticed some potential issues reporting AD and PL fields after running CombineVariants.

If the site was multiallelic in one vcf but biallelic in the other vcf, then the AD and PL fields are omitted altogether, such that GT:DP:GQ is what is reported in the combined file - I am guessing this is expected behavior?

however, if the site was biallelic in both vcfs, or monomorphic on one vcf but polymorphic in the other, the AD and PL fields are reported, but the numbers do not make sense.

my command line:

GATK -T CombineVariants -R ~/Capsella_rubella_v1.0_combined.fasta -V sc1_HC_94samp.vcf -V sc1_HC_set2.vcf -nt 20 -o sc1_HC.vcf &

original vcf record:

scaffold_1 3275 . G T 1170.18 . AC=2;AF=0.011;AN=188;BaseQRankSum=0.295;ClippingRankSum=-1.233;DP=5815;FS=1.186;InbreedingCoeff=-0.0106;MLEAC=1;MLEAF=5.319e-03;MQ=98.00;MQ0=0;MQRankSum=3.414;QD=9.36;ReadPosRankSum=1.761
GT:AD:DP:GQ:PL 0/0:59,0:59:99:0,293,4371 0/0:56,0:56:99:0,218,2973 0/0:40,0:40:99:0,161,2615 0/0:65,0:65:99:0,239,3898 0/0:62,0:62:99:0,227,3612

combined vcf record:

scaffold_1 3275 . G T 1170.18 . AC=3;AF=7.979e-03;AN=376;DP=11787;MLEAC=1;MLEAF=5.319e-03;MQ0=0;set=Intersection
GT:AD:DP:GQ:PL 0/0:0,0:59:99:0,92,0 0/0:0,42:56:99:3,5,2805 0/0:675,0:40:99:29,26,8482 0/0:45,0:65:99:43,6,11447 0/0:0,187:62:99:4079,0,0 0/0:55,0:27:99:0,182,941 0/0:0,251:69:99:0,162,2300

is this a bug? or am I missing something about how to use CombineVariants? thanks much for your help!

YW


Return to top Comment on this article in the forum