Cancer Genome Computational Analysis

The Cancer Genome Computational Analysis (CGCA) group — a central component of the Broad Institute’s Cancer Program — addresses unanswered questions of cancer biology and genomics through the development of computational methods and tools, in conjunction with platforms, datasets and resources. Specifically, the group works to understand cancer by characterizing and interpreting genomic data:

  • Characterization: fully describing the genomic events (including somatic and germline events, at DNA, RNA, and proteomic levels) in tumor and normal samples coming from individual patients
  • Interpretation: analysis of characterization data across populations or cohorts with the aims of identifying a) genes, regions, and pathways that are altered beyond what is expected by chance, and b) subtypes of disease

CGCA works closely with many groups within the Broad, including the institute’s Genomics, Genetic Perturbation, and Data Sciences platforms. CGCA members also engage with collaborators from the Broad’s partner institutions and outside organizations such as IBM. The team also participates in several National Institutes of Health-funded national consortia, such as The Cancer Genome Atlas (TCGA), the Genomic Data Analysis Network, the Clinical Proteomic Tumor Analysis Consortium, the Genotype-Tissue Expression (GTEx) Consortium.

The CGCA team has created a number of powerful genomic analysis tools and platforms for the cancer research community, including:

  • FireCloud, a cloud-based cancer genomics analysis platform developed with the Broad’s Data Science Platform. FireCloud houses the full dataset set generated by TCGA and a suite of robust cancer genomics workflows containing CGCA-developed tools, such as:
  • ABSOLUTE, which estimates purity/ploidy, and computes absolute copy-number and mutation multiplicities.
  • dRanger, a tool for identifying somatic rearrangements as clusters of aberrant paired-end sequencing reads in a tumor sample.
  • Mutsig, which identifies genes in a dataset that have mutated more often than would expected by chance.
  • POLYSOLVER, which infers HLA types from whole exome sequence data.

CGCA has also built and maintains several key genomic data resources, such as:

  • TumorPortal, a comprehensive mutational dataset comprising exome mutation data from 21 cancer types
  • FireBrowse, a user-friendly, web-based entry point to downloadable TCGA datasets, summary reports, and graphical tools.  FireBrowse sits atop TCGA GDAC Firehose, an application providing access to TCGA datasets and a robust selection of tools and pipelines for analyzing cancer genome data, as well as thousands of data analysis archives.
  • GTEx Portal, a comprehensive atlas and open database of gene expression and gene regulation across human tissues that provides a “normal” dataset against which to compare tumor-based  expression and regulation data.