The title kind of explains the situation, but basically I've got a SNP that shows up in IGV that I would call homozygous that the Unified Genotyper has labeled as heterozygous. The total read depth is 35, 32 of which were called as a SNP (A-->T), 2 were called the reference base (A), and one read contained a G. I went through your article describing why a SNP visible in IGV might not get called, and none of those five questions explained this situation. I didn't alter the --hets option at all either. Any help you might be able to offer would be greatly appreciated.
Hi GATK team,
When I checked closely by IGV, I found many of the loci, with all reads supporting variants, were reported as a HET genotype by HC.
After go back to the VCF result, I still don't understand why HC called this as a HET genotype, due to the ref's AD is 0, for example:
The whole line of this loci in VCF as below:
SCF1 47255 . T A 21688.96 . AC=33;AF=0.500;AN=66;ActiveRegionSize=225;DP=264;EVENTLENGTH=0;FS=0.000;Haploty peScore=73.8674;InbreedingCoeff=-0.8774;MLEAC=33;MLEAF=0.500;MQ=85.46;MQ0=0;NVH=5;NumHapAssembly=19;NumHapEval=12;QD=84.07;QDE=16.81;TYPE=SNP;extType=S NP GT:AD:GQ:PL 1/1:0,4:18:503,18,0 0/1:0,16:99:1732,0,2964 0/1:0,4:99:279,0,173 0/1:0,5:99:353,0,884 0/1:0,3:99:279,0,645 0/1:0 ,4:99:270,0,584 0/1:0,19:99:1428,0,4290 0/1:0,7:99:388,0,794 0/1:0,3:99:302,0,391 0/1:0,15:99:1087,0,1165 0/1:0,6:99:529,0,1437 0/1:0,4:99:4 82,0,493 0/1:0,12:99:1245,0,2556 0/1:0,5:99:433,0,605 0/1:0,4:99:624,0,1135 0/1:0,3:99:362,0,1219 0/1:0,4:99:454,0,866 0/1:0,5:99:200,0,29 8 0/1:0,14:99:1846,0,1785 0/1:0,3:99:130,0,205 0/1:0,16:99:1453,0,999 0/1:0,15:99:757,0,866 0/1:0,12:99:939,0,518 0/1:0,10:99:697,0,758 0/ 1:0,8:99:790,0,1319 0/1:0,9:99:748,0,1417 0/1:0,7:99:634,0,2198 0/0:0,6:63:0,63,1438 0/1:0,8:99:1029,0,1634 0/1:0,5:99:676,0,874 0/1:0,2:6 5:65,0,107 0/1:0,20:99:1069,0,1743 0/1:0,2:14:14,0,710
I seem to have found a bit of an issue with the Haplotype caller. Looking at variants called with it I've come across a number of small blocks in the genome where the Haplotype caller has called every individual (50 individuals) either RA or RR, which seemed a bit odd considering the population.
Looking at the BAMs and VCFs from SAMtools and the Unified Genotyper these blocks of snps clearly contain all three states as I'd expect RR/RA/AA. Looking at the BAM the reads are of decent quality and have no nearby insertions or deletions to complicate things, and the variants have been called correctly by Samtools and UG.
Any idea what's causing this? Attached is an IGV image showing one of the regions in question, Top VCF is the Haplotype Caller (showing all calls as RA or RR, which is incorrect), Second is UG (showing a mix of RR/RA/AA which is correct). The First BAM shows one of the Animals HC is calling incorrectly as RA for the 5 SNPs shown, while the Second is an Animal that HC is calling RA correctly.
Note these incorrect calls from the HC also passed VQSR. I believe the version of GATK is one of the 2.1 releases.