This article is part of the Best Practices documentation. See http://www.broadinstitute.org/gatk/guide/best-practices for the full documentation set.
This is our recommended workflow for calling variants in DNAseq data from cohorts of samples, in which steps from data processing up to variant calling are performed per-sample, and subsequent steps are performed jointly on all the individuals in the cohort.
The workflow is divided in three main sections that are meant to be performed sequentially:
- Data pre-processing: from raw DNAseq sequence reads (FASTQ files) to analysis-ready reads (BAM files)
- Variant discovery: from reads (BAM files) to variants (VCF files)
- Suggested preliminary analyses
Important note on GATK versions
The Best Practices have been updated for GATK version 3. If you are running an older version, you should seriously consider upgrading. For more details about what has changed in each version, please see the Version History section. If you cannot upgrade your version of GATK for any reason, please look up the corresponding version of the GuideBook PDF (also in the Version History section) to ensure that you are using the appropriate recommendations for your version.
Return to top Comment on this article in the forum