Esteban Abeyta, a senior biochemistry major at the University of New Mexico, deciphered cis-regulatory logic using random DNA to train models for predictive gene expression.
An understanding of how transcription factors (TFs) decipher DNA sequences to control gene expression remains elusive. The Broad offers a work environment that is energizing, supportive, and cooperative. The high level of collaboration among researchers fosters an environment that continues to tackle problems on the edge of science. Here, I received meaningful advice from top researchers and students, built strong relationships with mentors, and performed research that gave me the opportunity to learn skills and attain a depth of knowledge I hadn’t expected.Due to a lack of training data to learn models for cis-regulatory logic, many attempts at understanding and modeling this complex process have been met with limited success. An accurate data-driven model could be used as a critical tool in predicting the effects of genetic variants associated with human diseases. To understand this process, we measured the gene expression generated by more than 100 million 80 bp random promoter sequences using an established gigantic parallel reporter assay (GPRA). Here, promoters control the gene expression of yellow fluorescent protein (YFP) in S. cerevisiae. Fluorescence activated cell sorting in combination with high-throughput sequencing was used to measure expression. TF-nucleosome interactions were investigated using M.sssI footprinting, which methylates accessible CpGs and is read using bisulfite sequencing. Our ‘billboard’ model of transcription trained on 80bp promoters explains up to 93% of expression variation of test data. We identified specific TFs that remodel chromatin and investigated how these TFs reposition nucleosomes to affect gene expression. By measuring the expression of millions of 240bp promoter sequences, we hope the model will capture even more complex regulatory data like those associated with TF-nucleosome and TF-TF interactions. Random promoter sequences contain many TF binding sites and establish diverse expression levels in our YFP reporter scaffold. Initial data using random promoter sequences considerably explains how TFs and chromatin work together to regulate gene expression.
Project: Decoding cis-regulatory logic using random promoters to predict gene expression
Mentors: Carl de Boer and Eeshit Dhaval Vaishnav, Regev Lab