Genome Sequencing and Analysis Group
From GSA
The Genome Sequencing and Analysis Group (GSA) in Medical and Population Genetics at the Broad Institute is a team of computational biologists, software engineers, and hosted students and researchers developing algorithms for next generation DNA sequencers for medical and population genetics and cancer applications, as well as applying these algorithms to answer fundamental scientific questions
Contents |
What does GSA do?
GSA has extensive experience with processing of next-generation DNA sequencer data as well as genotyping and validation data along with downstream analysis of this data for medical and population genetics studies. The group was one of the most active participants in the 1000 Genomes Project pilot, generating SNP and indel calls for deep whole-genome single sample individuals (Pilot 2), low-pass whole genomes (Pilot 1) and deep targeted capture sequencing (Pilot 3), contributing to the official call sets for all three wings of the project.
The method development arm of GSA -- lead by Eric Banks -- has created a powerful framework in the The Genome Analysis Toolkit for analysis of next-generation sequencer data and analysis of variation discovered by NGS. The GATK was designed to simplify the process of developing efficient, robust tools for working with NGS data, and currently supports in a single integrated framework Solexa, SOLiD, 454, Complete Genomics, and Sanger sequencer data. Using this framework we have developed and released now widely used tools such as Base quality score recalibration, Local realignment around indels, multi-sample SNP and indel callers, as well as read and variation call QC tools. These tools are now integrated into the 1000 Genomes Project, The Cancer Genome Atlas, the Broad's production sequencing pipeline, as well as at many other sequencing centers and individual labs with sequencing machines.
While continuing the development of novel methods for working with NGS data, we increasingly are applying these tools to technology development, production data analysis and medical genetic projects as part of the Analysis Team lead by Kiran Garimella such as:
- Comparative analysis of new sequencing technologies such as Roche-454, ABI SOLiD, Illumina GAII, Illumina HiSeq, and Complete Genomics sequencers
- Design and analysis of whole exome hybrid capture technology
- ARRA-funded GO Exome Sequencing Project
- ARRA-funded GO Type-2 Diabetes Sequencing Project
- ARRA-funded GO Autism Sequencing Project
- Framingham Heart Study targeted sequencing project
- HLA typing and HIV elite controllers
- NHGRI "extremes" phenotype-driven whole-exome sequencing projects
- Ciliopathies
- Hutterites
- Familial Combined Hypolipidemia
- Many other projects arriving weekly
Group members
Current members
- Group Leader
- Mark A. DePristo, Manager of Medical and Population Genetics Analysis
- Computational Biologists
- Kiran Garimella, Team Lead, Analysis
- Eric Banks, Team Lead, Development
- Guillermo del Angel
- Menachem Fromer
- Bioinformatics Analysts
- Software Engineers
- Students and other visiting members
- Xiaoming (Sherman) Jia - Medical student at the Harvard-MIT Division of HST. Project: HLA caller algorithm
- Brett Thomas -- Harvard CS undergraduate
Former members
- Michael Melgar -- MIT Summer Research Associate, 2009-2010, now at Francis Collins Lab, NIH
- Andrew Kernytsky
Recruitment
We currently have one open position:
Previously open and then filled positions are:
- Bioinformatic analyst June 2009
- Senior Software Engineer July 2009
- Bioinformatic analyst Sept. 2009
- Development Team Computational biologist April 2010
- Bioinformatic analyst and Data manager May 2010
Please apply directly at Career Center. Email depristo@broadinstitute.org if you'd like more information.
The Genome Analysis Toolkit (GATK)
See the following page for detailed information about our programming framework and the tools built upon it: The Genome Analysis Toolkit
Queue
See the following page for detailed information about our process management framework: Queue
Tribble
See the following page for detailed information about the common reference meta-data framework Tribble: Tribble
GSA Firehose Pipeline Documentation
Documentation of GSA QC methodology is availabile here. For information on GATK analyses and parameters run in the standard Firehose pipeline, go here
GSA next-generation sequencing workshop
On Feb. 4th, 2010 GSA organized a next-generation sequencing workshop at the Broad institute. The topcs and speakers were:
- NGS Project Management -- Carrie Sougnez
- MAQ/BWA -- Heng Li
- The Picard Pipeline -- Tim Fennell
- Base Quality Score Recalibration - Ryan Poplin
- Quality Assessment of Hybrid Selection Experiments - Kristian Cibulskis and Andrew Kernytsky
- Indel Cleaning and Calling - Andrey Sivachenko and Eric Banks
- SNP Calling - Mark DePristo
- Quality Assessment of SNP Calls - Kiran Garimella
- Pooled SNP Calling - Jason Flannick
- Visualizing NGS with IGV - Jim Robinson
A complete video of the workshop will be available shortly.
GSA provided data
- Various data for use in assessing call sets
- HapMap genotypes
- GSA local pilot 1 and pilot 2 merged files
- GSA local indel-realigned pilot 2 files
- Preferred TechDev Samples
- 1000 Genomes SNP calls
- GSA FTP Server
Internal GSA website
See [1]
