High-throughput sequencing data or “reads” aligned to a reference genome reveal gene, exome, and genome variations that manifest themselves as phenotypic traits and disease-related biology. The DSP designs and develops software packages and best-practice pipelines for aligning sequencing reads and detecting and characterizing variations, providing large scale variant-level data for further analysis using platforms such as Hail.
DSP’s flagship Genome Analysis Toolkit (GATK) has robust best-practice capabilities for discovering small variants (i.e., single nucleotide polymorphisms and insertion/deletion mutations) in germline panel, exome, and whole genome data; and copy number variants in somatic exome and panel data. The GATK team is expanding on those capabilities to include analysis of other data sources (e.g., RNA sequencing, single-cell sequencing) and variation forms (e.g., structural variation).
DSP’s offerings are rounded out by other Broad-developed packages and applications such as Picard, as well as tools developed and released by others in the genomics community. And using the Broad-developed Workflow Description Language (WDL), researchers can write pipelines of commands that tap these applications and conduct comprehensive analyses reproducibly and at scale. WDL also provides a standardized way for researchers to share specific and detailed analysis steps broadly with the community.