Seeking common ground in cancer cell line data
The field of pharmacogenomics lies at the scalpel’s edge of personalized medicine, harnessing genomic tools to guide the use of drugs to treat disease. The idea is to marry precision with power — the right drug at the right time in the right patient. In cancer, researchers across the world have created two massive databases to help propel the biomedical community toward this goal. First published in Nature in 2012, these databases, the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC), combine detailed genomic data from hundreds of cancer cell lines with pharmacological data on how these cells respond to various anti-cancer drugs, which together can help researchers predict which cancer cells are vulnerable to which drugs.
Two years ago, a report, also appearing in Nature, used computational approaches to compare the data from these two independent efforts and found inconsistencies between them, raising questions about their validity. Now, the CCLE and GDSC teams have come together to address these concerns by systematically comparing key aspects of the projects’ data, including pharmacological data and genomic predictors of drug sensitivity. The work reveals overall agreement between the two datasets and also provides a model for standardized approaches to large pharmacogenomic projects to facilitate data-sharing and comparative studies. The new research appears in the November 16 advance online issue of Nature.
“Our findings confirm the tremendous value in both data sets,” said Nicolas Stransky, a lead author of the Nature study, who, together with Broad Institute colleagues Levi Garraway and Todd Golub, and collaborators from the Broad, Dana-Farber, and Novartis, helped pioneer the creation of the CCLE. “Not only are they powerful as individual resources, but there is remarkable strength in the fact that such large-scale data — produced independently — are largely in agreement.”
In the new CCLE-GDSC analysis, Stransky, now a senior scientist at Blueprint Medicines, and his colleagues set out to directly compare the pharmacological data for the two cell line projects, which together tested over one thousand cell lines for their sensitivities to more than one hundred drugs. Importantly, while there are nearly 500 cell lines in common between the CCLE and GDSC collections, only a subset were exposed to the same drugs (ranging from 82 to 256 cell lines per drug).
The researchers focused their analysis specifically on these overlapping pharmacological profiles, and adjusted their statistical analyses to account for the fact that for any given drug, the vast majority of cell lines are insensitive to its effects — and thus, are unlikely to be a source of significant molecular or genetic insight.
“It is important to remember that there is inherent variability, from a biological standpoint, in these data because the projects were carried out independently — in different labs, with different growth conditions, and on cell lines that are identical but drawn from distinct samples,” said Stransky. “Despite these differences, we found overall agreement, not just in numerical values but in terms of the scientific conclusions you can draw from the data.”
One of the key strengths of resources such as the CCLE and GDSC is that they can reveal genomic features that serve as markers of cells’ sensitivity to drugs, providing a path to predict which cells will respond to a given drug. Stransky and his colleagues also compared these drug sensitivity predictors. Using multiple analytical approaches, they found that the data sets yield robust results, both separately and when taken as a whole. As the scientists describe in their paper: “Not only do the two sets of drug screening data exhibit broad convergence — they also provide examples of consilience: a phenomenon in which independent lines of experimental evidence, each with their own inherent limitations, arrive at fundamental scientific agreement.”
The Cancer Cell Line Encyclopedia Consortium and The Genomics of Drug Sensitivity in Cancer Consortium. Pharmacogenomic agreement between two cancer cell line datasets. Nature, November 16, 2015. DOI 0.1038/nature15736