Luiz Mata Lopez
Luiz Mata Lopez is a junior studying computer science and mathematics at the University of Maryland, College Park.
Morphological profiling of cells is commonly employed in drug discovery across many disease areas. However, there has been limited study on its application specifically in the development of therapeutics for heart diseases. The Broad Institute almost doesn’t feel real to me, it is special and unlike any place I've ever visited. I had the opportunity to collaborate with scientists from diverse backgrounds and learn about the research of other motivated students just like myself. As a computer science student, getting the chance to work in a multidisciplinary lab with cardiologists and seeing the impact of my work was very empowering. This was an unforgettable experience not only due to my exciting research but the countless friends I’ve made for life both in my lab and in the BSRP cohort. It is an honor to be a part of a community of individuals who support new and creative ideas that push for a better future. The BSRP program also did a fantastic job at teaching me essential skills in science communication and leadership that will help me communicate my ideas to others and achieve my career goals.Cardiomyocytes are particularly challenging to work with, and technical variation arising from varying experimental procedures can lead to batch effects that decrease the biological signal in the data. Researchers use single-cell RNA sequencing batch correction methods to remove batch effects from data, but these tools are originally not intended for images and may make incorrect assumptions. Systematically choosing the best batch correction tool is crucial to successful downstream analyses and requires in-depth knowledge of the data.
Here, we present a generalizable pipeline to benchmark batch correction methods on CellProfiler imaging-based morphological data. The pipeline performs quality control and then simultaneously evaluates 4 batch correction methods on 8 different metrics, measuring data integration and biological signal preservation. Finally, the program visualizes a summary of the metrics and ranks the best methods.
Our findings show that for a cardiomyocyte imaging dataset, the best performing method is Harmony, a method based on a Gaussian mixture model. Many of the methods we evaluated showed worse performance, likely due to false assumptions about the data distributions. Hence, future work should explore the assumptions made by these methods and modifications of the models to better fit the morphological profiling data.
Project: Evaluating Batch Correction Methods For Cardiomyocyte Morphological Profilingc
Mentors: Carmen Diaz Verdugo, Stephen Fleming, Cardiovascular Disease Initiative