Lexington High School
Infectious Disease and Microbiome Program
Cholera affects millions of people every year and results in tens of thousands of deaths. Current cholera vaccines only have partial protection and can only last for several years. To develop an efficient vaccine, we need to have a deep understanding of host-cholera interaction mechanism; however, doing so requires knowledge of which specific genes are linked to cholera. A previous genome-wide association study (GWAS) found many genes that could possibly be linked to cholera, but none had small enough p-values to be considered statistically significant. Katherine improved the statistical significance of GWAS by applying machine learning to the problem of identifying cholera-linked genes. First, Katherine tested several different machine learning algorithms, including random forest and logistic regression. She found that logistic regression converged the fastest. She then applied this algorithm to a simulated dataset and found that it identified several cholera-linked genes in a statistically-significant fashion. Her group will continue to improve the work that Katherine started to reduce the number of false positives and false negatives predicted by her algorithm.
“My research at the Broad has introduced me to computational biology, which I previously knew little about,” Katherine said. “I’m now considering a related major in college, such as computer science and molecular biology.” Katherine's favorite part of working at the Broad were the people. She found everyone really welcoming and approachable. "It's not hard to find someone to talk to about your research or to ask for help, and their enthusiasm for their work is contagious."