Although we generally recommend using the HaplotypeCaller for calling variants, in some cases it is not possible to do so, as explained in the FAQs. In those cases (i.e. when you're processing a high number of samples together, working with non-diploid organisms or with pooled samples) you should use the UnifiedGenotyper instead.
The Unified Genotyper calls SNPs and indels separately by considering each variant locus independently. The model it uses to do so has been generalized to work with data from organisms of any ploidy.
Many variant callers specialize in either SNPs or Indels, or (like the GATK's own UnifiedGenotyper) have to call them using separate models of variation. The HaplotypeCaller is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In other words, whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region. This allows the HaplotypeCaller to be more accurate when calling regions that are traditionally difficult to call, for example when they contain different types of variants close to each other. It also makes the HC much better at calling indels.
Once you've pre-processed your data according to our recommendations, you are ready to undertake the variant discovery process, i.e. identify the sites where your data displays variation relative to the reference genome, and calculate genotypes for each sample at that site. Unfortunately some of the variation you observe is caused by mapping and sequencing artifacts, so the greatest challenge here is to balance the need for sensitivity (to minimize false negatives, i.e. failing to identify real variants) vs. specificity (to minimize false positives, i.e. failing to reject artifacts). We have found that it is very difficult to reconcile these objectives in a single step, so instead we decompose the variant discovery process into two steps: variant calling and variant quality score recalibration (VQSR) (or hard-filtering in cases where recalibration is not possible). The variant calling step is designed to maximize sensitivity, while the recalibration step applies smart filters that improve specificity.
The GATK includes two variant calling tools, HaplotypeCaller and UnifiedGenotyper, so your first decision here is to choose one of them for calling variants on your data. The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper, and we recommend using HaplotypeCaller in all cases, with only a few exceptions (see FAQs below).
Regardless of which caller you use, the variant recalibration process will be the same. If you need to do hard-filtering instead of variant recalibration, see the FAQs in the VQSR tab.