Tethered to the genome

Haley Bridger, July 24th, 2012
  • Image courtesy of ARTappler/iStockphoto

Matthew Freedman remembers the moment as clear as day. “I was sitting in the Massachusetts General Hospital cafeteria with David Altshuler talking about human genetics, and it just hit me,” Matthew recalls. “This is what I want to do.”

Matthew became the first postdoctoral researcher to work in the lab of Broad Institute core member David Altshuler in 2000, just as the HapMap project – an effort to map where the genetic similarities and differences in human beings reside – was being launched. Matthew and his colleagues could see that large-scale studies connecting disease risk to the genome were on the horizon. For many years, scientists had been able to pinpoint disease-causing genes for Mendelian disorders – diseases like Huntington’s disease, cystic fibrosis, and sickle cell anemia – caused by disruptions to a single gene. With the new maps, complex traits, like height and weight, and complex diseases, like diabetes, heart disease, and even cancer could be probed for their numerous genetic roots too.

But many of the results that emerged from these studies proved puzzling. Unlike in Mendelian disorders, in which changes tend to fall within genes, in complex diseases, many changes fell into the non-protein coding portions of the genome – stretches of DNA that do not code for proteins. A review article from 2010 suggests that approximately 80 percent of common variants associated with disease risk do not land in genes, but rather in the regions between them or in the parts of the DNA code that get cut out before proteins are made.

“We can decipher what a DNA change is doing if it lands within the protein-coding portion of a gene because we have the genetic code as a handbook,” says Matthew, who is now an associate member of the Broad Institute and an associate physician at Dana-Farber Cancer Institute. “But for the non-protein coding genome, we’re still trying to figure out the consequences of DNA alterations.”

In a recent PNAS paper, Matthew and his colleagues describe a relatively new strategy to assign meaning to the genetic changes that fall outside of genes. Based on previous studies, the researchers suspected that some of these changes occur in regulatory elements, machinery that influences how much RNA gets created. The researchers therefore directly measured RNA levels – the intermediary between DNA and protein. Using results from studies of prostate cancer, they connected DNA changes in non-protein coding regions to changes in RNA transcript levels, pointing them to three genes that contribute to the disease.

“How do you start tethering the non-protein coding risk alleles to their genes? That’s the question we decided to start working on,” Matthew says. “We ended up using a lot of different tools to decipher this.”

Many other groups are tackling this question too. Matthew’s ongoing work dovetails with several major efforts currently underway at the Broad Institute, all of which aim to shed light upon these mysterious regions of the genome. Already, researchers have been able to show that many stretches are not simply “junk DNA,” but rather contain regulatory elements, machinery that influences when and how much protein should be made in different cells, tissues, and parts of the body. Researchers involved in the ENCODE project, including Broad associate members Manolis Kellis and Brad Bernstein, are charting these functional elements, making their growing compendium available to the public. Simultaneously, Broad researchers involved in the Genotype-Tissue Expression (GTEx) project – led by Kristin Ardlie and Wendy Winckler  – are measuring how DNA changes control protein levels in more than 250 tissue types. In addition, researchers working on The Cancer Genome Atlas (TCGA) are mapping genetic changes in more than 20 types of cancers.

Matthew and his colleagues are drawing upon all of these resources as they continue their efforts to rope in results that map to non-protein coding regions of the genome.

“We’re heavily using ENCODE and TCGA data, and I’m sure we’ll use GTEx too,” says Matthew. “They’ve been a treasure trove for us.”