Jessica Kissi, a junior biochemistry and French & francophone Studies double major at Bates College, utilized gene expression levels to predict cancer cell line genetic dependencies through a statistical model.
Cancer is one of the leading causes of death in the world. Cancer cells cause such harm due to their uncontrollable multiplication rates and destruction of other bodily systems. In addition to gaining abundant skills in programming and effectively communicating science to various audiences, BSRP exposed me to the most diverse and talented group of scientists. Not only were scientists at the Broad diverse in thought, but they also brought various experiences to the table and represented several racial, ethnic, and religious backgrounds. The 2021 cohort especially showed me that science is not reserved for a specific group of people but rather, regardless of certain attributes of a person, passion is what drives scientists, and this summer, even in the face of adversities, passion drove my experiences. This experience has been the most enriching of my career in STEM thus far, and the skills I have attained will undoubtedly serve me well in achieving my long term goals.This uncontrollable multiplication of cancer cells is closely tied with genetics. One particularly relevant class of genes are genetic dependencies, which are genes that allow cancer cells to grow, proliferate, and invade. This study is part of a larger project that aims to accurately predict cancer genetic dependencies using omics features, such as proteomics, metabolomics, and genomics. In this work, we used gene expression levels to develop a statistical model to predict cancer cell line genetic dependency status. Gene expression data was retrieved from the Broad Institute’s Cancer Cell Line Encyclopedia (CCLE), an initiative that has profiled over a thousand human cell lines. Cancer dependency data was obtained from the Broad Institute’s Project Achilles. The actual dependency scores from Project Achilles were compared to the predicted dependency scores from our models to assess the model’s performance, in which a good model has a Pearson correlation value > 0.4. The statistical model used to evaluate the created model was a multi linear regression. The independent (predictor) variables were represented by 600 gene expression levels with the highest variance, and the dependent (resulting)variables were genes KRAS and TP53 in lung cancer cell lines, represented in two separate models. These genes were chosen because they are recognized as important genes in the proliferation of cancer cells across several cell lines. This model proved effective, with r > 0.4, showing that gene expression levels may have the ability to predict the dependencies of other genes across other cell lines. These findings, alongside others, can assist in the creation of new therapeutics to knockout dependent genes, halting proliferation of certain cancer cells.
Project: Predicting Cancer Cell Line Genetic Dependencies with Omics Features: Gene Expression Levels
Mentors: Alissa Campbell, Greka Lab
David Wu, Cancer Data Science