Cambridge Rindge and Latin School
Cambridge, MA

Daniel King
Stanley Center for Psychiatric Disease

“I initially disliked science class since all we did was learn and we ignored the discovering part of the whole process,” said Gonzalo.  Discovery turned out to be a huge part of Gonzalo’s summer.  The purpose of a genome-wide association study (GWAS) is to determine correlations between genotypes and phenotypes across a wide population.  One confounding variable in these studies is relatedness: two people who are related to each other will necessarily have correlated genotypes and phenotypes.  Thus, being able to predict the relatedness of two individuals based solely on their genomic data is of critical importance.  Gonzalo compared three computational algorithms for determining relatedness: KING, which does not work on recently-admixtured populations; PC-Relate using raw PCs, which works on general populations but may misclassify large families; and PC-Relate bootstrapped with KING, which he hypothesized would work on general populations even in the presence of large families.  Unexpectedly, Gonzalo discovered that bootstrapped PC-Relate gives identical results to PC-Relate--calling into question some previous studies on these algorithms.  Despite the unexpected results, Gonzalo left the program in high spirits: “In these 6 weeks, I got to do many things and I think that my favorite part was the ability to connect with a lot of people,” he said. “Also, this summer experience has significantly influenced my decision to pursue a science major related to data science. Before the program, I did not think of data science as a possible career path; however, the program has opened my eyes to this field.”