Studies expand and update an encyclopedia of cancer cell lines

Researchers take a broad, multi-faceted look at a large collection of cancer cell lines, creating a new trove of data to explore for drug development opportunities.

Susanna M. Hamilton, Broad Communications
Credit: Susanna M. Hamilton, Broad Communications

Large libraries of cancer cell lines — collections of cells that represent tumor types seen in cancer patients — can yield profound insights into tumors' unique genetic features and their sensitivities to current and potential treatments. The data produced by these libraries is invaluable for developing new therapeutic options for patients.

Such is the case with the Cancer Cell Line Encyclopedia (CCLE), a collection of more than 900 cell lines assembled starting in 2008 by the Broad Cancer Program in collaboration with the Novartis Institutes for BioMedical Research.

In 2012, the CCLE collaborators took a deep dive into the genomic features and drug sensitivities of these cells, cataloging gene expression, chromosomal copy number, and targeted gene sequencing data from all 947 lines and a number of drug-response profiles. This information has transformed how cancer scientists characterize drug targets and measure drug activity. For instance, the CCLE collection was instrumental in pinpointing the gene PRMT5 as a promising target in certain brain, lung, pancreatic, ovarian, and blood cancers; and WRN in cancer cells lacking a key DNA proofreading mechanism.

A multi-center research team has now greatly augmented this cancer research resource by incorporating new cell lines and adding new data spanning the molecular spectrum from sequence to expression to protein. Writing in Nature, the team — led by core institute member William Sellers, institute member on leave Levi Garraway, and Broad alumni Mahmoud Ghandi and Franklin Huang — report a major expansion of the CCLE dataset, which now includes:

  • RNA sequencing data for 1,019 cell lines
  • microRNA expression profiles for 954 lines
  • protein array data (899 lines)
  • genome-wide histone modifications (897)
  • DNA methylation (843)
  • whole genome sequencing (329), and
  • whole exome sequencing (326)

The new dataset, which is freely available at, also blends in CRISPR and RNA interference gene dependency data from the Broad's Cancer Dependency Map (DepMap) team and drug sensitivity data from the Wellcome Trust Sanger Institute's Genomics of Drug Sensitivity in Cancer project.

"We suspect that there are ways of looking beyond pairwise correlations like expression and protein levels to identify states of cancer that only reveal themselves when you see all the data in aggregate," Sellers explained. "We hope that with all of the data available, the community will help draw those macro-level pictures, enabling improved drug discovery efforts broadly in industry and academia."

In a companion paper in Nature Medicine, another team led by Sellers, Chemical Biology and Therapeutics Science Program graduate student Haoxin Li,  and institute scientist and Metabolomics Platform senior director Clary Clish also opened a new view into cancer biology by probing the abundances of 225 metabolites of 928 of the CCLE lines — the first such systematic metabolomic survey of a cell line collection of this size and diversity.

"These data, along with statistical models, allow us to see otherwise-hidden connections between genetic and epigenetic errors in cancer cells and changes in those cells' metabolic profiles," Li said. "The data reveal metabolic dependencies that, for instance, point to opportunities to expand the use of the anti-cancer drug asparaginase, and to exploit levels of a metabolite called kynurenine as a prognostic biomarker for certain kinds of immunotherapy."

The CCLE collection provides the backbone for two large-scale cancer discovery efforts. One is the DepMap project, an effort being undertaken at the Broad Institute and at the Sanger Institute to systematically identify genetic dependencies (vulnerabilities that might serve as targets for designing new therapies or repurposing existing ones) across hundreds of cancer cell lines using RNA interference, CRISPR, and drug screens.

The second is PRISM, a system that uses genetically-barcoded versions of the CCLE cell lines to identify biomarkers that could be used to predict tumors' responses to different drug compounds.

"Taken together, these datasets constitute a massive community resource for anyone in the cancer research field using cell line models," Sellers said. "One can't overestimate the data's power for discovery and for understanding cancer biology mechanisms across tumor types."

These two studies received support from the National Cancer Institute, Novartis, and other sources.

Paper(s) cited

Ghandi M, Huang F, et al. Next generation characterization and functional mapping of the Cancer Cell Line Encyclopedia. Nature. Online May 8, 2019. DOI: 10.1038/s41586-019-1186-3.

Li H, et al. The landscape of cancer cell line metabolism. Nature Medicine. Online May 8, 2019. DOI: 10.1038/s41591-019-0404-8.