Producing the original sequence of the human genome was a landmark achievement. Yet, it posed a new challenge: the genome sequence was an endless stream of As, Cs, Gs, and Ts, devoid of functional interpretation. So beyond protein-coding genes, which constitute only 1% of the genome, we were unable to decode the functions of the other sequences.
Epigenomic maps provide an opportunity to identify and understand the sequences, interacting proteins, and chromosomal structures that act throughout the other 99% of the genome to control gene activity. We use DNA sequencing-based mapping technologies such as ChIP-Seq, Bisulfite-Seq, MINT-Chip, RNA-Seq, and chromatin conformation capture assays to annotate the genome and advance our understanding of how it functions (see Technologies for more details).
Genome annotations also allows us to examine the consistent structural features of the genome and lend a framework for interpreting mutations that contribute to disease, including cancer.
A major goal of the program is to generate reference epigenomic data as part of collaborative, international projects. For an idea of scale, the Epigenomics Program creates over 1,000 epigenomic maps per year. These data are utilized by thousands of researchers worldwide to better annotate and understand the human genome.
Epigenome mapping projects
ENCODE 4: The NIH Encyclopedia of DNA Elements (ENCODE) project is now in its fourth iteration and 10th year. We collaborate with scientists around the world to create reference maps of the epigenome and understand the regulatory potential of the DNA between protein-coding genes. For ENCODE 4, the Epigenomics Program, together with BTL, forms one of five epigenome mapping centers located in the USA. We also serve as a major data coordination center. ENCODE 4 will now also incorporate rare cell populations from human organoids, as well as disease samples. All data are provided to the community open access.
Our core labs - led by Chuck Epstein, Noam Shoresh, and Andi Gnirke - work together to optimize technologies for reproducible, large scale epigenomic research. We adapt methods to enable rigorous quality control, and where feasible, high-throughput robotic operation. We devise computational pipelines to support automated processing and sharing of data.
Our current production-scale capabilities include:
See our Technologies page for information about other technologies utilized by the program.
Past mapping projects
NIH Roadmap: This five-year program mapped the epigenomic landscape of many cell types, resulting in a public reference map against which researchers can compare the aberrant epigenomic characteristics associated with particular diseases. Data are also available via this resource.