The answer depends on what tool we're talking about, and whether we're considering variant discovery or variant manipulation.
GATK variant manipulation tools are able to recognize the following types of alleles:
The HaplotypeCaller is a sophisticated variant caller that can call different types of variants at the same time. So in addition to SNPs and indels, it is capable of emitting mixed records by default, as well as symbolic representations for e.g. spanning deletions. It does emit physical phasing information, but in its current version, HC is not able to emit MNPs. If you would like to combine contiguous SNPs into MNPs, you will need to use the ReadBackedPhasing tool with the MNP merging function activated. See the tool documentation for details. Our older (and now deprecated variant caller, UnifiedGenotyper, is more limited. It only calls SNPs and indels, and does so separately (even if you run in calling mode BOTH, the program performs separate calling operations internally) so it is not able to recognize that SNPs and Indels should be emitted together as a joint record when they occur at the same site.
The GATK is currently not able to detect SVs (structural variations) or CNVs (copy number variations), but there are some third-party software packages built on top of GATK that provide this functionality. See GenomeSTRiP for SVs and XHMM for CNVs.
I notice that although UnifiedGenotyper and HaplotypeCaller identify indels, none or reported greater than a length of around 50bp. Obviously integrating this ability into the already complicated algorithms is not easy, nor are such variants especially common - but they can easily be biologically-important, as well as interfered SNP calls, e.g. if a SNP is in a heterozygous deletion.
Do you know of any callers for larger indels/CNVs, specifically for multi-sample NGS projects ? I've looked at pindel and break-dancer (and probably a few others) but they seem mostly for single samples.
I am interested in finding copy number variation in my samples. I have looked for SNPS and INDELS with GATK UnifiedGenoTyper (still have to use it with haplotypecaller). Is there a walker to find CNV's (duplications or deletions) in GATK?
Hope to hear from you soon.
Hello Geraldine et al,
I've a question about CNV calling, which you might or might have an answer for. We're doing a case/control analysis on two cohorts, and one of the analyses we'd like to carry out is an examination of CNV length - one thing we want to do is analyse by genome (average site above/below average, say), and by region (same general idea).
While calling SNPs and indels seems straightforward enough with UG, I wonder if you have a best practice for calling CNV - or rather, a candidate for what might become an integrated best practice? Maybe I can even help with maturation/integration.
Thanks for your relentless work on the site, tools and community, this is the place to be.