Citing the GATK

From GSA
Jump to: navigation, search

Basically, the ideal way to cite the GATK currently is as a double citation, such as in:

We sequenced 10 samples on 10 lanes on an Illumina HiSeq 2000, aligned the resulting reads to the hg19 reference genome with BWA [Li and Durbin], applied GATK [McKenna et al.] base quality score recalibration, indel realignment, duplicate removal, and performed SNP and INDEL discovery and genotyping across all 10 samples simulatenously using standard hard filtering parameters or variant quality score recalibration [DePristo et al.]


The papers to cite are:

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep; 20(9):1297-303. Epub 2010 Jul 19. [1] 
A good publication covering the computational philosophy underlying the GATK. A good citation for the GATK in general.
DePristo, M., Banks, E., Poplin, R., Garimella, K., Maguire, J., Hartl., C., Philippakis, A., del Angel, G., Rivas, M.A, Hanna, M., McKenna, A., Fennell, T. Kernytsky, A., Sivachenko, A, Cibulskis, K., Gabriel, S., Altshuler, D. and Daly, M. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 2011 Apr; 43(5):491-498. [2]
Describes some of the key data processing tools commonly used in the GATK for NGS data processing and variant calling. Describes in detail the base quality score recalibrator, indel realigner, SNP calling, variant quality score recalibrator and their application to deep whole genome, whole exome, and low-pass multi-sample calling. If you use the GATK for variant calling, this is a good citation.
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox