Stretched out from end-to- end, the human genome – a sequence of 3 billion chemical letters inscribed in a molecule called DNA – is over 2 meters long. Famously, short stretches of DNA fold into a double helix, which wind around histone proteins to form the 10nm fiber. But what about longer pieces? Does the genome’s fold influence function? How does the information contained in such an ultra-dense packing even remain accessible?
In this talk, I describe our work developing ‘Hi-C’ (Lieberman-Aiden et al., Science, 2009; Aiden, Science, 2011) and more recently ‘in-situ Hi-C’ (Rao & Huntley et al., Cell, 2014), which use proximity ligation to transform pairs of physically adjacent DNA loci into chimeric DNA sequences. Sequencing a library of such chimeras makes it possible to create genome-wide maps of physical contacts between pairs of loci, revealing features of genome folding in 3D.
Next, I will describe recent work using in situ Hi-C to construct haploid and diploid maps of nine cell types. The densest, in human lymphoblastoid cells, contains 4.9 billion contacts, achieving 1 kb resolution. We find that genomes are partitioned into contact domains (median length, 185 kb), which are associated with distinct patterns of histone marks and segregate into six subcompartments. We identify ∼10,000 loops. These loops frequently link promoters and enhancers, correlate with gene activation, and show conservation across cell types and species. Loop anchors typically occur at domain boundaries and bind the protein CTCF. The CTCF motifs at loop anchors occur predominantly (>90%) in a convergent orientation, with the asymmetric motifs “facing” one another.
Next, I will discuss the biophysical mechanism that underlies chromatin looping. Specifically, our data is consistent with the formation of loops by extrusion (Sanborn & Rao et al., PNAS, 2015). In fact, in many cases, the local structure of Hi-C maps may be predicted in silico based on patterns of CTCF binding and an extrusion-based model.
Finally, I will show that by modifying CTCF motifs using CRISPR, we can reliably add, move, and delete loops and domains. Thus, it possible not only to “read” the genome’s 3D architecture, but also to write it.