Horse genome sequence and analysis published in Science

By Nicole Davis, Broad Communications, November 5th, 2009
Twilight, the horse whose genome was decoded by an international team of researchers
Twilight, a Thoroughbred horse from Cornell University.
Image courtesy of Doug Antczak, Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University

An international team of researchers has decoded the genome of the domestic horse Equus caballus, revealing a genome structure with remarkable similarities to humans and morethan one million genetic differences across a variety of horse breeds. In addition to shedding light on a key part of the mammalian branch of the evolutionary tree, the work also provides a critical starting point for mapping disease genes in horses.

"Horses and humans suffer from similar illnesses, so identifying the genetic culprits in horses promises to deepen our knowledge of disease in both organisms," said senior author Kerstin Lindblad-Toh, scientific director of vertebrate genome biology at the Broad Institute of MIT and Harvard and a professor of comparative genomics at Uppsala University in Sweden. "The horse genome sequence is a key enabling resource toward this goal."

For centuries, horses have been close human companions. The animals were first domesticated 4,000 to 6,000 years ago and were harnessed primarily for power and transportation. Over time, as machines have become the chief sources of agricultural and industrial muscle, those roles have shifted to mainly sports and recreational activities.

Predating this coexistence, humans and horses share an evolutionary history that has implications for the health of both species. Like other mammals, the two species share much of the same DNA. Moreover, horses suffer from more than 90 hereditary diseases that show similarities to those in humans. Recognizing the need for genomic tools to foster biomedical research on horses as well as humans, a research consortium led by scientists at the BroadInstitute of MIT and Harvard launched a project three years ago to decode the horse's genetic blueprint. The effort was based on a ten-year collaboration among an international group of scientists to exploit genomic technologies for the benefit of equine health known as the Horse Genome Project.

"We are especially grateful to our collaborators in the horse genetics community who participated in this project," said Lindblad-Toh. "We really could not have done this work without them."

To generate a high-quality genome sequence, the researchers analyzed DNA from an adult female Thoroughbred named Twilight. The horse's DNA was decoded using conventional capillary DNA sequencing technology (known as Sanger sequencing) to reveal a genome that is roughly 2.7 billion letters, or nucleotides, in size - slightly larger than the genome of the domestic dog, and smaller than both the human and cow genomes.

A remarkable feature of the horse genome is the small number of chromosomal rearrangements that have occurred in horses relative to humans. During the course of evolution, parts of chromosomes can get shuffled to other locations in the genome, or they can remain in their original ancestral order, like beads on a string - a situation known as "synteny."  More than half of the horse chromosomes show synteny with a single human chromosome. This is in contrast to dogs, where the figure is less than one-third.

Another intriguing result to emerge from the horse genome analysis pertains to chromosomes and something called the "centromere." If you imagine chromosomes as X-shaped, centromeres are the central constrictions where the arms of the ‘X' come together.

More than just a nexus, centromeres ensure that cells inherit copies of each chromosome during cellular division. Despite this essential role, relatively little is known about them. It is clear that they contain highly repetitive DNA sequences, but what is less clear is which comesfirst, the centromere or its repeats.

Lindblad-Toh and her colleagues, including Elena Giulotto of Pavia University in Italy, were surprised to uncover a region on horse chromosome 11 that contains a developing centromere, already functional, but frozen in a young state. Analyses of this budding centromere revealed no repetitive DNA, suggesting that centromeres appear first and their repeats appear later

"We don't know a lot about centromeres, particularly because they have proven so difficult to analyze by DNA sequencing," said first author Claire Wade, a former researcher at the Broad Institute and the Center for Human Genetic Research at Massachusetts General Hospital who is now a professor at the University of Sydney in Australia. "This result helps address some important questions about how centromeres evolve."


An Appaloosa horse.Image courtesy of iStockphoto

In addition to sequencing the genome of a Thoroughbred horse, the researchers also examined DNA from a variety of other horse breeds, including the American quarter horse, Andalusian, Arabian, Belgian draft horse, Hanoverian, Hakkaido, Icelandic horse, Norwegian fjord horse, and Standardbred breeds. The team surveyed the extent of genetic variation both within andacross breeds to create a catalog of more than one million single-letter genetic differences (called "single nucleotide polymorphisms" or SNPs).  

In a first proof-of-principle of the power of trait mapping in horses, the researchers harnessed the SNP catalog to localize the candidate mutation in the Leopard Complex or "Appaloosa spotting," in which horses' coats are mottled with striking patches of white, either with or without colored spots. Horses carrying this trait often suffer from a form of night blindness, a disorder that also afflicts humans. The researchers narrowed the list of genetic suspects in horses to 42 associated SNPs, including two candidate mutations residing near a gene involved in pigmentation.

"This demonstrates the utility of the horse for disease gene mapping," said Wade. "By making these resources freely available to the scientific community, we hope that many new results will flow from them in the coming years."

Other Broad Institute researchers who contributed to the Science study include: the Broad Institute's Genome Sequencing Platform and Whole Genome Assembly Team, Tara Biagi, Sarah Fryc, Manuel Garber, Sante Gnerre, Eric Lander, Evan Mauceli, Rob Onofrio, Ted Sharpe, Snaevar Sigurdsson, Jared White, and Mike Zody. 

The research was funded by the National Human Genome Research Institute as well as the Dorothy Russell Havemeyer Foundation, the Volkswagen Foundation, the Morris Animal Foundation and the Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale.

Paper(s) cited: