Epigenomics approach illuminates the dark corners of the genome

The human genome, like the universe, is composed of only a tiny fraction of readily understood regions. The rest - more than 98 percent - is largely shrouded in mystery. Just as physicists probe the universe in search of dark matter and dark energy, epigeneticists are exploring the parts of the genome outside of our genes for various pieces of evidence that can shed light on new corners. By reaching beyond the protein-coding regions that account for only 1.5 percent of the genome, researchers are extending the interpretable part of the genome and revealing regions that may play important roles in how genes are expressed.

In a recent paper in Nature Biotechnology, Broad researchers described a new computational technique that they used in human T cells, which play a role in the immune system, to predict the function in some of the non-coding parts of the genome. They were able to do this by examining a complex known as chromatin, which is composed of DNA tightly wrapped around histone proteins. The researchers looked at patterns of chromatin modifications to begin assigning meaning to previously uncharted areas of the genome.

"This expands the fraction of the genome that we can potentially annotate and the implications are tremendous, especially for disease mapping," said Manolis Kellis, senior author of the paper and an associate member at the Broad Institute.

In studies of genetic variation in humans, researchers have found some DNA differences that appear to be tied to diseases, but don't occur within our genes. These single nucleotide polymorphisms (SNPs) - differences in a single letter of DNA such as an A instead of a C - frequently fall into the vast non-coding regions between our genes.

"We show an example of a disease-associated SNP that falls into an area that's 40,000 base pairs from the end of the nearest gene," said Jason Ernst, first author of the paper and a postdoctoral fellow at MIT. "But when you overlay the chromatin data, the SNP falls into a very interesting chromatin state."

"All of these non-coding SNPs that appeared to be sitting in the middle of nowhere can now be potentially associated with specific chromatin patterns," said Kellis. Those patterns in turn appear to frequently control whether a neighboring (or far away) gene will be read and translated into protein or skipped over.

Chromatin makes up chromosomes, and changes to chromatin's structure can prevent or allow certain regions of the genetic code to be read and therefore expressed. There are over 100 ways that chromatin can be modified, and researchers have hypothesized that specific combinations of changes to chromatin may lead to different biological ends. Kellis and Ernst set out to find such distinct and biologically meaningful chromatin patterns, which they call "chromatin states."

"The big surprise came in the large number of distinct chromatin states that we could recognize using combinations of chromatin marks, and the specific biological roles they are associated with", said Kellis.

Ernst and Kellis were able to find 51 unique types of chromatin state, or "51 flavors of chromatin" as Kellis called them. Each of these states was associated with a particular function, such as suppressing or increasing the activity of a gene or a class of genes.

For instance, the authors found 11 different ‘flavors' of promoter regions, which sit directly in front of a gene that will be copied. Genes involved in response to DNA damage are linked to one of these states, while genes involved in embryonic development are linked to another, and genes involved in RNA processing are tied to yet another state.

"We were surprised to find how much information hides in the combinatorial patterns of chromatin marks," says Ernst. "Instead of simply ON or OFF information, we found that we could recognize different functional classes of genes solely based on their chromatin patterns."

Other chromatin patterns seemed to indicate repressed states, where regions of the genome are being actively turned off, and yet other classes were linked to the beginning and even the termination of transcription, or the copying of DNA into RNA.

The authors also found several chromatin states likely to represent different classes of enhancer regions, which can increase gene expression while sitting far away from their target genes. These regions have been very difficult to pinpoint in the past, but by looking at patterns in chromatin and surrounding clues, the researchers were able to systematically comb through the genome to find them.

"This is a really powerful way to get at the regulatory portion of the genome, which is perhaps the most important but has also been the most elusive," said Kellis.

Kellis and Ernst grouped these 51 flavors of chromatin into five broad categories of promoters, transcribed, active intergenic, repressed, and repetitive regions. Within these broad categories, chromatin states differed from one another in subtle but biologically significant ways, raising the question of how many flavors of chromatin are likely to exist in nature.

"It really depends on your desired level of resolution," said Kellis. While the five broad classes could be readily recognized with few chromatin marks, the biological distinctions between some states only became visible when many chromatin modifications were studied. "Studies of many more marks and many more cell types will be needed to resolve the true biological importance of every chromatin state."

To find these states, the researchers let chromatin guide them. They deliberately studied chromatin in its context - looking not only at direct chromatin modifications in any one location, but also letting information in from neighboring regions of the genome. As patterns emerged, they confirmed the potential functions of the genetic elements by looking at other genomic features, like whether the regions are evolutionarily conserved in other mammalian species, and whether certain regions showed signs of "DNase hypersensitivity," which means that they would be more easily cut by the DNase enzyme, and thus more accessible for a transcriptional regulator to bind them.

"This technique gave us a way of viewing major chromatin patterns in a systematic way," Ernst said. "We then put together an extensive pipeline to correlate these states with almost every type of genomic and functional feature we could think of, providing an independent validation that the states are in fact biologically meaningful, and allowing us to characterize that meaning for each state."

Kellis and Ernst found that the key to making sense of patterns of chromatin changes was to look at these changes as both additive and combinatorial. They found that studying chromatin changes in isolation could not capture the whole picture - some changes are linked to one another and others act in concert to influence the way genes are expressed.

Surprisingly, although the researchers only looked at chromatin states in T cells, they were able to pick up on processes that are involved a variety of cell types. The reason for some of these states in T cells remains a mystery - perhaps they are "memories" from the states the T cell passed through as it developed from a less specialized cell type or they may indicate that the cell is poised to transform into more specialized cell types or maintain some reprogramming potential.

As more cell types are studied, Kellis and Ernst's methodology could be very powerful for understanding the coordinated dynamics of chromatin changes associated with differentiation, and also disease. This may be especially relevant for cancer and other diseases associated with regulatory changes.

Kellis hopes that as additional chromatin maps become available for many cell types, such approaches will help focus on SNPs tied to certain diseases and map them to specific chromatin states. More generally, they could help shed light on the relative contributions of genetic diversity and epigenomic changes in disease susceptibility and disease onset. "As we start surveying more and more cell types, we should be able to focus on particular diseases," he said. "It's an exciting prospect for the future."

Paper(s) cited

Ernst and Kellis. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology. 25 July 2010. doi:10.1038/nbt.1662