wrong candidate haplotype chosen by HaplotypeCaller
Posted in Ask the team | Last updated on 2013-01-10 02:10:12


Comments (3)

I've been experiencing some apparent errors with HaplotypeCaller that I think could be related to how it chooses candidate haplotypes when performing multi-sample calling. Please see the example files I've uploaded to the server (cooketho_20130103.tar.gz). For instance if you look at position 3511 in sample 2, there are 14 non-reference reads and 0 reference reads. When HaplotypeCaller is run with just this sample, it calls this locus homozygous non-reference, which seems to me to be the correct behavior. But when run with all 14 samples, it doesn't call a SNP at this locus. Repeating the run in debug mode shows that the (immediate) cause is that there were 11 candidate haplotypes found, and not a single one of them had the non-reference allele at position 3511. Why?

I came across an earlier post that suggested in some cases increasing the --minPruning value can be of use, but I tried this to no avail.

http://gatkforums.broadinstitute.org/discussion/1764/haplotypecaller-in-cohorts

My organism is a plant, and is is considerably more heterozygous than human, but changing the --heterozygosity value did not appear to help either. Double check me on this if you like.

Can you please suggest a fix, or perhaps release some documentation on how HaplotypeCaller selects candidate haplotypes?

P.S. Any idea of when the source will be released to the public, or when a more comprehensive manual will be released? Would be very helpful for figuring out what is going on in cases like this.

Thanks! Tom


Return to top Comment on this article in the forum