(BP2.1) Calling Variants with HaplotypeCaller in gVCF mode
Posted in Best Practices for Variant Discovery in DNAseq (overview) | Last updated on 2014-04-24 22:03:07

Comments (11)

This article is part of the workflow documentation describing the Best Practices for Variant Discovery in DNAseq data. See http://www.broadinstitute.org/gatk/guide/best-practices for the full workflow.

Many variant callers specialize in either SNPs or Indels, or (like the GATK's own UnifiedGenotyper) have to call them using separate models of variation. The HaplotypeCaller is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In other words, whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region. This allows the HaplotypeCaller to be more accurate when calling regions that are traditionally difficult to call, for example when they contain different types of variants close to each other. It also makes the HaplotypeCaller much better at calling indels.

In addition, the HaplotypeCaller is able to estimate the probability that a given site is non-variant. This is very useful when you want to distinguish between cases where no variant was called because the evidence suggests that the site is non-variant, as opposed to cases where no call could be made either way because there was no data available. This capability, conferred by the reference confidence model, is used in the Best Practices workflow to produce a gVCF (short for genomic VCF) for each sample in a cohort.

Return to top Comment on this article in the forum