Hi. I am getting VERY odd results with some Streptococcus equi sequence. The BAM files from BWA align well in IGV, but when I run them through your pipeline there are many local errors where it seems that a single indel has been incorrectly multiplied up - somehow. You need to see the IGV screenshot.!
The bottom is a BAM file from BWA and the top is the final one from the GATK pipeline.
First of all, thank you for a truly great toolkit! It is no doubt the best one out there.
Now, I have a question regarding visualization of a SNP that is not called by UG but looks convincing in IGV. Yes, I've looked at the FAQ page gatkforums.broadinstitute.org/discussion/1235/why-didnt-the-unified-genotyper-call-my-snp-i-can-see-it-right-there-in-igv but I'm still not completely convinced that this is a false positive.
The BAM files have gone through the Best Practices workflow prior to SNP calling. Calling was done using UG with subsequent recalibration steps, where I followed the guidelines under gatkforums.broadinstitute.org/discussion/1259/what-vqsr-training-sets-arguments-should-i-use-for-my-specific-project. SNP calling was done using GATK 2.4-9.
Below is a screenshot from IGV showing the SNP call:
Fullsize here: s24.postimg.org/sepow851v/igv_snp.png
The average mapping quality for the reads that include the SNP is 50 and the average base quality at the locus of the SNP is 28.7 (not including 4 positions where base quality is below 10). These values are calculated from the values shown by IGV
Are these values really too low to not confidently call this SNP? I mean a base quality of 28.7 means a probability of 99.87% that the base call is correct. Isn't that good enough?
Please help me understand this, and let me know if you need more information. :)