I am using CombineVariants to combine two multisample vcfs from made from HaplotypeCaller. The vcfs cover the same genomic region, and do not overlap in sample names. I have noticed some potential issues reporting AD and PL fields after running CombineVariants.
If the site was multiallelic in one vcf but biallelic in the other vcf, then the AD and PL fields are omitted altogether, such that GT:DP:GQ is what is reported in the combined file - I am guessing this is expected behavior?
however, if the site was biallelic in both vcfs, or monomorphic on one vcf but polymorphic in the other, the AD and PL fields are reported, but the numbers do not make sense.
my command line:
GATK -T CombineVariants -R ~/Capsella_rubella_v1.0_combined.fasta -V sc1_HC_94samp.vcf -V sc1_HC_set2.vcf -nt 20 -o sc1_HC.vcf &
original vcf record:
scaffold_1 3275 . G T 1170.18 . AC=2;AF=0.011;AN=188;BaseQRankSum=0.295;ClippingRankSum=-1.233;DP=5815;FS=1.186;InbreedingCoeff=-0.0106;MLEAC=1;MLEAF=5.319e-03;MQ=98.00;MQ0=0;MQRankSum=3.414;QD=9.36;ReadPosRankSum=1.761
GT:AD:DP:GQ:PL 0/0:59,0:59:99:0,293,4371 0/0:56,0:56:99:0,218,2973 0/0:40,0:40:99:0,161,2615 0/0:65,0:65:99:0,239,3898 0/0:62,0:62:99:0,227,3612
combined vcf record:
scaffold_1 3275 . G T 1170.18 . AC=3;AF=7.979e-03;AN=376;DP=11787;MLEAC=1;MLEAF=5.319e-03;MQ0=0;set=Intersection
GT:AD:DP:GQ:PL 0/0:0,0:59:99:0,92,0 0/0:0,42:56:99:3,5,2805 0/0:675,0:40:99:29,26,8482 0/0:45,0:65:99:43,6,11447 0/0:0,187:62:99:4079,0,0 0/0:55,0:27:99:0,182,941 0/0:0,251:69:99:0,162,2300
is this a bug? or am I missing something about how to use CombineVariants? thanks much for your help!