Identification of the functional elements in the human genome — including both coding and non-coding — is a key foundation for biomedical research. One of the most powerful ways to discover these elements is through cross-species comparisons with other mammalian genomes — in effect, deciphering evolution's laboratory notebook containing the results of 100 million years of evolution.
The mammalian genome project is an NIH-funded effort to expand the current genome coverage of the mammals (human, chimpanzee, mouse, dog, opposum) by sequencing 24 additional mammals to low coverage (2x). The goal is to create low-coverage genome assemblies and align resulting sequence to the human genome to permit comparative genomic analysis.
The Broad Institute has sequenced 15 mammals, while two other centers have sequenced the other 9 mammals. We have developed algorithms to identify regions of sequence similarity across species, which have persisted through evolution and are indicative of genomic functionality. These regions include genes and smaller regulatory elements, such as transcription factor binding sites, which play key roles in determining the activation of genes and pathways in different cellular contexts.
The mammals receiving low-coverage sequence were chosen primarily to maximize the total branch length of the evolutionary tree. Emphasis was also placed on organisms that represent the diversity of the mammalian tree and, where possible, are biologically useful models.
Though low-coverage genome analyses are effective for use in identifying features of the human genome shared across most mammals, we recognize the inherent limitations associated with these analyses. We have obtained higher quality sequence data (6-7X coverage) from a limited set (8 of 24) of mammals picked for low coverage which will significantly aid in the annotation and understanding of the human genome.
The publication of the initial analysis of this dataset can be found here.
For constraint elements and other datasets related to this project, files are available here.
Rhesus Macaque (Macaca mulatta)
Cow (Bos taurus)
Dog, Domestic (Canis familiaris)
Guinea Pig (Cavia porcellus)
Sloth, Two-toed (Choloepus hoffmanni)
Nine-banded Armadillo (Dasypus novemcinctus)
Kangaroo Rat (Dipodomys ordii)
Tenrec (Echinops telfari)
Horse (Equus caballus)
Hedgehog, European (Erinaceus europaeus)
Cat, Domestic (Felis catus)
Human (Homo sapiens)
Elephant, African Savannah (Loxodonta africana)
Mouse Lemur (Microcebus murinus)
Mouse (Mus musculus)
Little Brown Bat (Microbat) (Myotis lucifugus)
Pika (Ochotona princeps)
Rabbit (Oryctolagus cuniculus)
Bushbaby (Northern Greater Galago) (Otolemur garnettii)
Chimpanzee (Pan troglodytes)
Hyrax, Rock (Procavia capensis)
Fruit Bat (Megabat, Flying Fox) (Pteropus vampyrus)
Rat (Rattus norvegicus)
Shrew, Common (Sorex araneus)
Squirrel, Thirteen-lined Ground (Spermophilus tridecemlineatus)
Tarsier (Tarsier syrichta)
Tree Shrew (Tupaia belangeri)
Dolphin, Bottlenosed (Tursiops truncatus)
Alpaca (Vicugna pacos)