Tumors contain genetically heterogeneous cancerous subpopulations that can differ in their metastatic potential and response to treatment. Our work over the past few years has focused on using computational and statistical methods to reconstruct the phylogeny and the full genotypes of these subpopulations using data from high-throughput sequencing of tumor samples.
Tumor subpopulations can be partially characterised by identifying tumor-associated somatic variants using short read sequencing. Subsequent inference of copy number variants or clustering of the variant allele frequencies (VAFs) can reveal the number of major subpopulations present in the tumor as well as the set of mutations which first appear in each subpopulation. Further analysis, and often different data, is needed to determine how the subpopulations relate to one another and whether they share any mutations. Ideally, this analysis would reconstruct the full genotypes of each subpopulation.
I will describe my lab’s efforts to recover these full genotypes by reconstructing the tumor’s evolutionary history. We do this by fitting subpopulation phylogenies to the VAFs. In some circumstances, a full reconstruction is possible but often multiple phylogenies are consistent with the data. We have developed a number of methods (PhyloSub, PhyloWGS, treeCRP, PhyloSpan) that use Bayesian inference in non-parametric models to distinguish ambiguous and unambiguous portions of the phylogeny thereby explicitly representing reconstruction uncertainty. Our methods consider both single nucleotide variants as well as copy number variations and adapt to data on pairs of mutations.