The completion of the Human Genome Project marked a tremendous advancement in basic science and molecular medicine. The completed human reference genome has allowed for multitudes of genome-wide association studies (GWAS), which in turn have identified tens of thousands of single nucleotide polymorphisms (SNPs) that are correlated with the presence of complex disease. Several features of genes and genomes, including sequence conservation and protein deleteriousness predictions, have been developed for prioritizing likely causative SNPs for further interrogation, but efficiently prioritizing SNPs remains an open problem. Identifying influential SNPs that reside in the non-coding portion of the genome is particularly challenging due to the fact that there are minimal ways to interpret the non-coding sequence. Given that the majority of disease-causing SNPs are believed to fall in the noncoding genome, this problem is of particular importance. Because it is known that non-coding sequence influences the regulation of gene expression, we hypothesize that gene expression can be an informative feature in prioritizing SNPs. With recent advances in RNA-sequencing (RNA-seq), we now have gene expression data available for hundreds of humans across many cell types. In addition, the Broad Institute has sequenced RNA-seq data of nine mammals across 12 tissue types, which allows us to characterize evolutionary patterns of gene expression. With this newly available data, we are able to analyze the relationship of disease-causing and non-disease-causing SNPs to expression features of nearby genes and discover informative SNP prioritizing features. We assessed levels of SNPs in expression-conserved and -divergent genes by mapping their frequency distribution, thereby testing conservation of expression as an adequate feature in prioritizing likely causative SNPs. Our results indicate that lower rates of disease-associated SNPs in introns of genes conserved in expression may be an informative observation for prioritizing GWAS SNPs. Finally, this research will provide greater insight on SNPs causative of complex disease and further the development of molecular interventions to correct these genetic errors.
PROJECT: Identifying gene expression features to distinguish SNPs causative of disease.
The cutting-edge, fast-paced environment at the Broad has equipped me with critical skills to achieve success, instilling in me the will to overcome failures and a strong sense of motivation and optimism. Every day, my abilities were tested, my curiosity met with challenging questions, and my thirst for solving problems satisfied. Although I had research experience prior to arriving at the Broad, SRPG was undoubtedly one of the most rewarding experiences I could have had as a young scientist.