Missing links of the transcriptome
Image courtesy of Bang Wong, Broad Communications
Despite the many discoveries of the genomic era, much of the human genome remains unexplored. Only 5% of our DNA is thought to be functional, with the 20,000 or so protein-coding genes accounting for just one-fifth of that and the rest still unknown. Researchers at the Broad Institute of MIT and Harvard and Beth Israel Deaconess Medical Center have now discovered the identity of some of those unknown players using a new technique that looks for unusual signatures in the genome. The results of this work, appearing in the February 1 advance online issue of Nature, introduce a vast new class of genomic characters and spark further exploration of their roles in human biology and disease.
In certain marks along the genome, Broad associate member and co-senior author John Rinn and his fellow researchers saw hints that protein-coding genes might not tell the whole story. DNA wraps around partner proteins to form a structure called “chromatin,” and by virtue of certain chemical groups attached to chromatin in specific patterns, cells can figure out which bits of DNA should be read or “transcribed”. In earlier work to map where these marks appear, the researchers noticed that actively transcribed regions have a unique pattern of chemical modification. Upon further examination, they found more instances of this “chromatin signature” than expected.
“There were more of these marks than can be explained by protein-coding genes, so we wondered what the extra ones were marking,” said Mitch Guttman, first author on the new study and a graduate student at MIT working with advisor and Broad Institute founding director, Eric Lander, also a senior co-author of the study.
The team suspected that some of the extra marks could signal a special kind of RNA that is more than a mere messenger. A large portion of the genome’s DNA is transcribed into RNA, but only a small number of those transcripts go on to make protein. Some genes encode functional RNAs that are never translated into proteins, including a handful of classical examples known for decades and some recently discovered classes of tiny RNAs, such as microRNAs.
Among the remaining transcripts are many long RNAs, whose importance has been debated for years by scientists. Guttman and his colleagues, however, suspected at least some could have important roles in the cell other than RNA’s ”textbook” role in making proteins. A handful of such “long intervening noncoding RNAs” or lincRNAs had already been found — XIST, for example, helps females manage their double dose of X chromosome genes, and HOTAIR regulates the activity of other key genes.
Some said the few known lincRNAs were exceptions to the rule, rather than critical components, but Guttman, Lander, and Rinn, also an assistant professor at Harvard Medical School and BIDMC, thought otherwise. If those extra chromatin signatures they noticed in the genome were actually marking lincRNAs, then perhaps these new players could account for some of the mystery RNA transcripts in cells and join protein-coding genes in the ranks of the genome’s functional elements. To test this hypothesis, the team first set out to pinpoint potential lincRNA sites in the genome, using the chromatin signature as a guide.
“We extended the clear chromatin signature of protein-coding genes to the dark matter of the genome, these vast spaces in-between the textbook genes,” said Rinn. The team discovered an astounding 1,586 places in the genome that looked like protein-coding genes, but weren’t. They then confirmed that these regions qualified for the title of “lincRNA” by proving that they were producing large RNA transcripts and were incapable of coding protein. Importantly, the spots were also conserved across species, a telltale sign of function. “Because they’re highly conserved, it suggests that they play an important role in mammals, or else we would have lost them long ago as we evolved,” said Guttman. “But ultimately, we need to know what they’re doing and how they’re doing it.”
To uncover the functions of these new lincRNAs, the team used a method they call “guilt by association.” “By finding which protein-coding genes they hang out with, we can guess the lincRNAs probably have a similar role,” said Rinn. The group analyzed genetic material from mouse cells and, using an analysis tool developed by the Broad’s Computational Biology and Bioinformatics Group (called “Gene Set Enrichment Analysis”), they clustered lincRNAs and protein-coding genes into groups with likely similar functions.
The results suggest that common functions among the non-coding RNAs include cell proliferation and regulation of cell cycle, strong indications that they could play a role in cancer. Digging a little deeper, the team found that certain lincRNAs were inactivated in cells lacking the p53 tumor suppressor gene. The experiment confirmed the role predicted by the team for these lincRNAs — p53-mediated DNA damage response. Interestingly, many of the lincRNAs directly regulated by p53 also lie near its known targets in the genome, suggesting that lincRNAs could be key elements in major cellular processes.
Other potential lincRNA functions include embryonic stem cell pluripotency and immune surveillance, although exactly how the lincRNAs influence those biological pathways remains to be uncovered. One clue might come from where the lincRNAs reside in the genome. Rinn and his team wondered if they are distributed randomly or if their location is somehow important. “It turns out that overwhelmingly, these things sit next to transcription factors,” said Rinn, “suggesting that they play a role in regulating how genes are turned on and off.” Rinn and his fellow researchers suspect that a uniform mechanism could be at work, a kind of buddy system through which lincRNAs and transcription factors work as a team to control the activity of other genes.
Although the work provides important hints at what lincRNAs are doing in the cell, further study is needed to tease apart the true functions. “We first needed a map of where the lincRNAs were, and a method to predict what they do,” said Rinn. “Now we have a game plan to go in and dissect their functions.” With collaborators in the Broad’s RNAi Platform, Rinn and his colleagues plan to block the lincRNAs in cellular models to discover downstream effects and reveal the biological mechanisms through which they act.
The research represents an exciting step towards a complete picture of all the genome’s functional parts. “We’ve known that the human genome still has many tricks up its sleeve,” said Lander. “But, it is astounding to realize that there is a huge class of RNA-based genes that we have almost entirely missed until now.” Guttman added that, “This new class of large RNAs really opens up a new avenue to understanding the genome.” Although the discovery is the culmination of a thorough research effort by the team, Rinn looks forward to the challenging work still to come. “We’ve opened up the wardrobe and entered into the mysterious world of noncoding RNA,” said Rinn. “Now we need to find the mechanisms by which these creatures act.”