Five Questions for Manolis Kellis
If you were to plot out the locations of the approximately 23,000 genes that make us human, our genome would look like a vast desert, dotted with rare gene oases. The ENCODE Project aims to map these supposed genetic wastelands, which upon closer examination, harbor critical genomic machinery. These signals are encoded in diverse functional genomic elements that determine, among their many other functions, how and when genes are turned on and turned off. In 2007, researchers completed the pilot phase of the project, mapping one percent of the human genome. This week, researchers from the model organism ENCylcopedia of DNA Elements Consortium (modENCODE) have published papers that mark the completion of the next phase of the project; they have cataloged all of the functional elements found in the genomes of the fruit fly, Drosophila melanogaster, and the roundworm, Caenorhabditis elegans, two critical animal model organisms. (Read the NIH press release here.)
Manolis Kellis is an associate member at the Broad Institute and has led the integrative analysis effort for the complete fruit fly genome. A recipient of this year’s Presidential Early Career Awards for Scientists and Engineers, he worked closely with collaborators from Berkeley, University of Chicago, Duke, and Harvard on this project.
Q1: Why did the ENCODE Project set out to build a comprehensive catalog of functional elements in the fruit fly rather than immediately tackling the whole human genome?
MK: The fly and the worm are the two models that have time and time again revolutionized the way we understand biology and animal development. Part of the motivation for modENCODE is genome size; because the genome of the fly is nearly 25 times smaller than the human genome, you can carry out genome-wide experiments at a much lower cost. We can also benefit from the tremendous amount of knowledge that is available for Drosophila development and biology. This gives us the ability to test our predictions experimentally. In the human genome, we certainly have tremendous genetic resources, but we don’t have the ability to take an element and go and test it to see how it behaves in vivo [in the organism].
The chromatin fiber that makes up our chromosomes is
composed of nucleosome units, each consisting of DNA
wrapped around eight histone proteins. The tails of these
histone proteins can undergo numerous post-translational
modifications, encoding distinct chromatin states,
represented here by different colored nucleosomes.
Image courtesy of Broad Communications
Q2: How does this build upon your previous research in which you found chromatin states or unique signatures in the genome that pinpointed the locations of functional elements?
MK: Chromatin states are one of the pillars upon which we have relied as we start thinking about a systematic encyclopedia of DNA elements. The vast majority of these elements are not protein coding. Many are RNA transcripts and many are not transcribed at all. To expand our annotations beyond the genes, chromatin states provide a general methodology that has been key in many of our recent studies.
Q3: What are some of the major findings from this paper?
MK: Our first challenge was to expand our annotations from the roughly 20 percent of the genome associated with protein coding exons, to the remaining 80 percent whose function is unknown. As we started combining different kinds of information from non-coding transcripts, binding sites for regulatory proteins, and chromatin states, the fraction of the genome associated with functional elements increased dramatically, now hitting nearly 80 percent.
Another major finding was the vast amount of overlap in the binding of different transcription factors [elements that can turn on and off multiple genes]. The reason for this overlap appears to be chromatin: specific chromatin states are highly enriched for these HOT (high-occupancy targets) regions, and sequence motifs for the bound factors are sometimes depleted. Our findings suggest an interplay between binding and chromatin, and also DNA replication, in these HOT regions.
Q4: The stated goal of ENCODE is to annotate functional elements, and yet your paper speaks of regulatory networks. How is that?
MK: Piecing together the binding of transcription factors reveals local interconnections between regulators and their targets, which we can piece together into cellular control circuits, or regulatory networks that have very intriguing properties. These networks are hierarchical, with “master” regulators involved in lots of decisions, and contain recurrent building blocks of feedback and cooperation. They also allow us to predict the function and the expression of target genes from the combined properties of their regulators.
Q5: Why did you look across time points and cell types?
MK: The genome is not static; it’s dynamic. Over time, you can see how genes turn on and off across different developmental stages, in response to changes in the regulators that target them. By studying these coordinated changes, we found combinations of transcription factors acting together to decide the expression of their common targets, and use this information to build predictive models. And predictive models allow us to plan interventions, and ask how do I turn off the expression of that gene? Going back to human disease, many disease-associated variants fall in non-coding regions with possible regulatory roles. We can now use our predictive models about the regulatory roles of each region, and the regulators that are binding there, to find what’s likely to be affected by mutations, and what regulators we should be targeting.