You are here

Biologically informed ML for cancer discovery

Haitham Elmarakeby, Van Allen Lab, Dana-Farber Cancer Institute; Broad Institute

Meeting: Biologically informed deep neural network for prostate cancer discovery ​

Despite advances in prostate cancer treatment, including androgen deprivation therapy, metastatic castration resistant prostate cancer (mCRPC) remains largely incurable. Recent advances in collecting and sharing large quantities of genomic records from patients with primary and metastatic prostate cancer have not yet been matched with advances in computational model development to shed light on the underlying biology of mCRPC. Here we developed a biologically informed deep learning model (P-NET) that can accurately identify advanced prostate cancer samples based on their genomic profiles. By using a sparse model architecture that encodes different biological entities including genes, pathways, and biological processes, we were able to interpret the model in a way that is not matched by typical deep learning models. In a systematic unbiased way, P-NET recovered known biology of mCRPC via AR, TP53, RB1, and PTEN disruption, as well as less expected genes such as MDM4. We showed experimentally that MDM4 mediates enzalutamide resistance, showing that it may be a potential therapeutic target. We envision that our model will be helpful in both predicting clinical outcomes of cancer patients and generating biological hypotheses to better understand the underlying cancer biology.

Felix Dietlein, Van Allen Lab, Dana-Farber Cancer Institute; Broad Institute

Primer: Genomic tools for interpreting patterns of somatic driver and passenger mutations in cancer 

Most cancer genomes contain a large number of mutations. Most of them are passenger mutations without direct effects on tumor signaling. A few of them are driver mutations that change the function of proteins in tumor cells and allow them to proliferate at a faster rate than normal tissue. Thus, the better we understand the processes that shape mutations in cancer genomes, the more precisely we can tailor therapies to a patient’s individual genome. However, passenger mutations cannot be clearly distinguished from driver mutations based on an individual tumor genome. Hence, genomic sequencing data from thousands of tumor patients have been generated, such as TCGA, for a detailed characterization of the landscape of driver mutations. While the most common driver events are well understood, large datasets and advanced computational tools are required to detect rare driver events. In this talk, I will give an overview of a tool for interpreting driver mutations I have worked on. I will explain the necessity of understanding passenger mutations and mutation distribution patterns, in general, to arrive at a clearer understanding of driver mutations.