Until now, HaplotypeCaller was only capable of calling variants in diploid organisms due to some assumptions made in the underlying algorithms. I'm happy to announce that we now have a generalized version that is capable of handling any ploidy you specify at the command line!
This new feature, which we're calling "omniploidy", is technically still under development, but we think it's mature enough for the more adventurous to try out as a beta test ahead of the next official release. We'd especially love to get some feedback from people who work with non-diploids on a regular basis, so we're hoping that some of you microbiologists and assorted plant scientists will take it out for a spin and let us know how it behaves in your hands.
It's available in the latest nightly builds; just use the
-ploidy argument to give it a whirl. If you have any questions or feedback, please post a comment on this article in the forum.
Caveat: the downstream tools involved in the new GVCF-based workflow (GenotypeGVCFs and CombineGVCFs) are not yet capable of handling non-diploid calls correctly -- but we're working on it.
We have added omniploidy support to the GVCF handling tools, with the following limitations:
When running, you need to indicate the sample ploidy that was used to generate the GVCFs with
-ploidy. As usual 2 is the default ploidy.
As of GATK version 3.3-0, the GVCF tools are capable of ad-hoc ploidy detection, and can handle mixed ploidies. See the release highlights for details.
I am using GATK UnifiedGenotyper 2.3-9 for a set of pooled samples. I am uncertain about the order of PL values in polyploid samples, as it wasn't defined in VCF v4.1 specifications. The ordering formula described in VCF v4.1: F(j/k) = (k*(k+1)/2)+j only applies to diploid case. May I know how GATK extended the ordering formula to handle polypoid samples?
Example VCF line: GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 0/0/1/1/1/1/1/1:71,177:249:17:6:0.750:5685,995,510,254,101,17,0,92,32767
Thank you very much!
Best regards, Allen
Hello, I have 454 reads of loci on polyploid individuals. I am able to produce assemblies containing the different copies of one locus per individual. But I want to extract the reads corresponding to each copies, to then be able to produce phylogenies. Does the GATK can do something like this?