Protein coding just tip of genome function

About 5 percent of the mammalian genome is conserved by evolution. Surprisingly, less than a third of that consists of protein-coding genes. The rest of this sequence, known as conserved noncoding sequences (CNCs), is of growing interest to scientists, both as a clue to how genomes work and because variations within CNCs may contribute to human disease. Are CNCs actually functional bits of sequence and, if so, what do they do?

A team of scientists led by Joel Hirschhorn, assistant professor of genetics and pediatrics at Children's Hospital Boston, Harvard Medical School, and associate member of the Broad Institute, has now established that these CNCs are indeed functional in humans, and that they should be examined for genetic changes in disease studies even though the specific functions of these CNCs remain unknown. Their work, a collaboration with Emmanouil Dermitzakis and his colleagues at the Sanger Institute, is published in the December 25 online edition of Nature Genetics.

To demonstrate that CNCs were in fact evolutionarily conserved genome sequences rather than just mutational "cold spots," the researchers took advantage of both the recently released HapMap data on human sequence variants and the chimpanzee genome. The chimp genome helped define ancestral sequences at these variants and the HapMap allowed them to determine the frequencies of new variant alleles — if the CNCs being examined were under strong purifying evolutionary selection, then new alleles within CNCs should be less common in the population than new alleles in the rest of the genome. This was, in fact, what the researchers observed, and subsequently verified by resequencing of about 100 CNCs from two different populations.

CNCs include a variety of genome pieces, including intronic sequences (areas between exons, or gene-coding sequences), promoter elements, untranslated regions and sequences far removed from a known gene. However, the precise functions of these various CNCs is still largely unknown. But this does not make them irrelevant: The fact that these are jealously guarded by evolution, combined with their relative abundance compared to protein-coding sequence, argues strongly that genetic variation in these CNCs plays a critical role in disease. The tools at hand, such as the HapMap, should make screening CNCs in addition to protein-coding regions standard practice in future disease gene studies, even before the function of each is determined.