High-throughput sequencing data or “reads” aligned to a reference genome reveal gene, exome, and genome variations that manifest themselves as phenotypic traits and disease-related biology. The DSP designs and develops software packages and best-practice pipelines for aligning sequencing reads and detecting and characterizing variations.
DSP’s flagship Genome Analysis Toolkit (GATK) offers robust best-practice capabilities for performing variant discovery analyses covering all major genomic variant types, both germline and somatic, in panel, exome, and whole genome data. The DSP methods development team is expanding on those capabilities to include other analysis use cases including microbial genomics, single-cell sequencing, and machine learning approaches for integrating various types of omic and health data.
DSP’s offerings are rounded out by other packages and applications such as Hail and Picard that are developed and released in collaboration with others in the genomics community at the Broad and beyond. And using the open-source community-driven Workflow Description Language (WDL), researchers can write pipelines of commands that tap these applications and conduct comprehensive analyses reproducibly and at scale. WDL also provides a standardized, portable way for researchers to share specific and detailed analysis steps broadly with the community.