Once you've pre-processed your data according to our recommendations, you are ready to undertake the variant discovery process, i.e. identify the sites where your data displays variation relative to the reference genome, and calculate genotypes for each sample at that site. Unfortunately some of the variation you observe is caused by mapping and sequencing artifacts, so the greatest challenge here is to balance the need for sensitivity (to minimize false negatives, i.e. failing to identify real variants) vs. specificity (to minimize false positives, i.e. failing to reject artifacts). We have found that it is very difficult to reconcile these objectives in a single step, so instead we decompose the variant discovery process into two steps: variant calling and variant quality score recalibration (VQSR) (or hard-filtering in cases where recalibration is not possible). The variant calling step is designed to maximize sensitivity, while the recalibration step applies smart filters that improve specificity.
The GATK includes two variant calling tools, HaplotypeCaller and UnifiedGenotyper, so your first decision here is to choose one of them for calling variants on your data. The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper, and we recommend using HaplotypeCaller in all cases, with only a few exceptions (see FAQs below).
Regardless of which caller you use, the variant recalibration process will be the same. If you need to do hard-filtering instead of variant recalibration, see the FAQs in the VQSR tab.