GATK Best Practices
Recommended workflows for variant discovery analysis with GATK


What can you do with this?

The GATK Best Practices workflows provide step-by-step recommendations for performing variant discovery analysis in high-throughput sequencing (HTS) data. They enable discovery of SNPs and small indels (no size limit in theory but adjustments may be required to call indels > 50 bp) in DNA and RNAseq. They do not yet enable discovery of structural variants (SVs) or copy number variants (CNVs). Although they were originally designed for human genome research, the GATK Best Practices are widely used (with adaptations as described in the documentation) for analysis of non-human organisms of all kinds, including non-diploids.

What's in the box?

Our recommendations describe in detail the core processing steps required to go from raw reads coming off the sequencing machine, all the way to a variant callset that can be used in downstream analyses. Wherever we can, we try to provide guidance regarding experimental design, quality control (QC) and pipeline implementation options (how to set it up in practice), but please understand that those are dependent on many factors including sequencing technology and the hardware infrastructure that are at your disposal, so you may need to adapt our recommendations to your specific situation.


Break it down

We currently have two separate workflows for Germline DNA and for RNAseq, and we are developing a workflow for Somatic DNA. All of them are divided into three sequential phases:

Context overview
Pre-processing starts from raw sequence data, either in FASTQ or uBAM format, and produces analysis-ready BAM files. Processing steps include alignment to a reference genome as well as some data cleanup operations to correct for technical biases and make the data suitable for analysis.
Variant Discovery starts from analysis-ready BAM files and produces a callset in VCF format. Processing involves identifying sites where one or more individuals display possible genomic variation, and applying filtering methods appropriate to the experimental design.
Callset Refinement starts and ends with a VCF callset. Processing involves using meta-data to assess and improve genotyping accuracy, attach additional information and evaluate the overall quality of the callset.

Use the buttons in the figure above to start exploring the Best Practices workflows in detail.