Scientists use genetics to dig into a tumor’s past

A study of tumor exomes reconstructs a timeline of mutations for certain cancer types, revealing insight into the order of genetic drivers of the disease. 

Silver DNA in the dirt being uncovered by a hand holding a brush
Credit: Ricardo Job-Reese, Broad Communications

For patients with some types of cancer, diagnosis can happen at an advanced stage. While a tumor grows unnoticed, it accumulates hundreds to thousands of mutations, making it difficult for scientists studying late-stage cancers to figure out which ones contributed to tumor growth at the earlier stages of cancer. Knowing more about the genetic events that take place during the progression to cancer can help scientists engineer more realistic cell and animal models of the disease and even develop better ways to detect and treat it early.

Now a team of cancer researchers at the Broad Institute of MIT and Harvard, Massachusetts General Hospital (MGH), and The Ohio State University have shown that they, like archaeologists who inspect buried artifacts to reconstruct a society’s history, can examine patterns in a tumor’s genetic landscape to uncover its past. They have built an analytical approach that allows researchers to piece together the mutational history of advanced tumors by analyzing the tumors’ exomes (the protein-coding portions of the genome). 

The team validated and tested their method on data from two subtypes of head and neck cancer, one related with human papillomavirus virus (HPV) exposure and one not. They identified certain driver events associated with early stages of disease that were not previously identified by other approaches, and other important molecular events that they linked to aggressive tumor growth. Deeper insight into a tumor’s past generated by this method could help guide new strategies for cancer screening, prevention, and precision therapeutics that focus on a patient’s particular tumor.

The technology, called PhylogicNDT, is described in Nature Cancer, and is freely available for the scientific community.

“This method should be one of the tool sets in our toolbox any time we analyze cancer samples, alongside methods that look for mutational signatures and driver mutations,” said co-senior author Gad Getz, who is also director of cancer genome computational analysis and an institute member in the Cancer Program at the Broad, a professor of pathology at Harvard Medical School, and Paul C. Zamecnik Chair in Oncology at the MGH Cancer Center. 

“The ability to reconstruct the order of genetic events using exome data opens new avenues to analyze tumor types that really haven’t been studied in this way in any sort of detail,” said co-first author Ignaty Leshchiner, a member of the Getz lab who is now an associate professor in computational medicine at Boston University School of Medicine. “Our method holds potential to one day improve patient care by identifying early, influential mutations that may determine a patient’s prognosis or response to therapy.”

The work was also led by co-senior author James Rocco, chair of the Department of Otolaryngology-Head and Neck Surgery at The Ohio State University Comprehensive Cancer Center. Other researchers who contributed to the work include co-first authors Edmund Mroz of Ohio State and Justin Cha and Daniel Rosebrock at Broad.

Unearthing clues in the exome

To learn about a tumor’s history, scientists often compare its DNA with that from the precancerous lesion from which it originated. But for many cancer types, it is difficult to obtain samples from such lesions, either because they are deep in the body, cannot be detected, or it is not clear what should be sampled. 

Getz and his colleagues hypothesized that they could instead infer the early genetic progression of these cancers by analyzing DNA from more mature tumors using clever computational strategies. They developed PhylogicNDT to look for patterns of misspellings and extra or missing bits of DNA in the tumor’s exome. 

The method relies, in part, on the tendency for cancer genomes to duplicate large chunks of DNA or even duplicate themselves entirely, yielding multiple copies of the genome that continue to generate mutations. Taking known rates of mutations into consideration, PhylogicNDT can analyze exome data and compare these duplicated portions of the tumor genome to then reconstruct the likeliest order of mutational events. 

The researchers used PhylogicNDT to study tumor DNA from several hundred people with HPV-negative head and neck squamous cell carcinoma (HNSCC), which is a subtype associated with tobacco and alcohol use. They generated a reconstruction of genetic events that was similar to data from a premalignant lesion-based model of the disease, validating their approach. They also identified additional driver mutations that could only be deduced using today’s advanced sequencing technologies. 

Tumor timeline

Having validated their method, the researchers then used it to analyze more than 100 HPV-positive HNSCC tumors, which are caused by the HPV virus integrating its genetic material into the host genome. These tumors also grow faster than HPV-negative tumors and are typically diagnosed at a late stage, when recognizable premalignant tissue is no longer present.

The team found that the virus can integrate into the host genome years or even decades before a patient is diagnosed, and that it can keep integrating at different points in the tumor’s genome as the tumor grows. Their analysis also uncovered several of the same mutations found in HPV-negative tumors, in addition to some that are unique to the faster-growing type. 

In both HNSCC subtypes, the scientists observed instances where the genome doubled, producing four copies instead of two, many years before diagnosis. Surprisingly, they also saw some cases with three copies of the genome, in which one of the doubled copies was later deleted, and these tumors were more aggressive and more likely to resist treatment. 

“These insights allow us to tie information on mutation timing to tumor progression and survival differences,” said Leshchiner.

The researchers hope PhylogicNDT can help others shed light on other cancer types that lack samples of early-stage tissue or, in the case of rare cancers, have few samples of any tissue available. In addition, these computational approaches could alleviate the need for exhaustive experimental studies in cells or animals that rely on trial and error to figure out which combinations of events can cause cancer. The method can also be applied to numerous existing datasets of cancer exome sequences to enrich what’s already known about those diseases. 


The research was supported in part by the National Institute of Dental and Craniofacial Research and the Paul C. Zamecnick, MD, Chair in Oncology at Massachusetts General Hospital.

Paper(s) cited

Leshchiner I, Mroz EA, Cha J, et al. Inferring early genetic progression in cancers with unobtainable premalignant disease. Nature Cancer. Online April 20, 2023. DOI: 10.1038/s43018-023-00533-y.