Spotlight: Drawing the tree of life

Thucydides, the world's first historian, aptly described the difficulty that lies in reconstructing the past. In the opening paragraph of his book, History of the Peloponnesian War, he wrote: "The events of remote antiquity, and even those that more immediately precede the war, could not from lapse of time be clearly ascertained."

Photo courtesy of Tom Lloyd, iStock Photo
Photo courtesy of Tom Lloyd, iStockphoto

The quotation, though penned some 2,500 years ago, highlights the challenge facing every historian — to accurately reconstruct episodes of the past for which only scattered fragments of information remain. Modern-day biologists, whose work often adopts a historical perspective, face this challenge too. They seek to retrace the evolutionary pathways that created all of today’s living organisms by studying DNA, nature's own — and sometimes spotty — historical record.

The first person to realize that organisms bear the indelible stamp of their origin was none other than Charles Darwin. In July 1837, only a month after he began the first of his famous notebooks devoted to the Origin of Species, he scribbled a crude but unmistakable evolutionary tree. The drawing, with the most ancient species at the bottom and their descendants branching off irregularly along the trunk, captured two key insights. First, each living species is not created de novo but is related to all other organisms through common ancestry. Second, the genealogical relationships among them (called "phylogenies") can be visualized in the form of a great tree of life.

Now, a century and a half later, a complete and accurate tree of life remains an elusive goal. Systematists — the archaeologists of DNA — have tried to flesh out evolutionary relationships by comparing just one gene or perhaps several of them from a group of organisms. But, how reliable are these phylogenies based on single genes?

Together with researchers in Sean Carroll's laboratory at the University of Wisconsin, we have explored this question using genomic sequences from eight species of yeast, three of which were produced by the Broad’s Fungal Genome Initiative. We collected more than 100 genes interspersed throughout the yeasts' genomes and compared the evolutionary trees obtained from each of the genes. The analysis revealed that single and few-gene datasets have a significant probability of generating inaccurate and conflicting evolutionary trees. By contrast, datasets composed of much larger sets of genes yielded a single, fully resolved phylogeny with maximum statistical support.

Antonis Rokas
Antonis Rokas
Photo by Maria Nemchuk

As systematists are constrained by what little data is available, they often have to strike a balance between the number of species they study and the number of genes they use to reconstruct the species’ evolutionary history. Therefore, we expanded our data set to include an additional six yeast species, allowing us to investigate the relative contribution of gene number and species number to phylogenetic accuracy. Importantly, we found that no matter how many species were used, increasing the number of genes studied was a prerequisite for a more accurate phylogeny.

The results from these two studies indicate that more data could resolve many difficult phylogenetic problems. So, we decided to test this hypothesis on a branch of the tree of life that has proved particularly challenging — that of the animal kingdom. We devised novel experimental protocols to systematically amplify large numbers of genes from any animal, applied them to several animal species, and combined them with bioinformatic data from additional species. Despite the large amount of data analyzed, we found that many of the phylogenetic relationships among animals simply could not be resolved.

Not to be discouraged, we decided to test our methods using genomic data from animals' closest relatives at the kingdom level — the fungi. Thanks in part to sequencing done here at the Broad, we had access to an abundance of genomes throughout the fungal tree. We sampled exactly the same genes from fungi that we had from animals and tested whether the lack of resolution in the animal tree was due to the choice of genes or to the branching pattern specific to the animal phylogeny.

Importantly, we found that the genes robustly resolved phylogenetic relationships within fungi, suggesting that the amount of data we had for animals was potentially adequate to resolve relationships among them — even though it didn't. We wondered if one possible explanation for this lack of resolution might lie in the different shapes that the evolutionary trees of fungi and animals have taken on in the course of evolution. For instance, it is well recognized that, instead of looking like arborescent trees, some evolutionary genealogies look more like bushes, which can pose special problems. Through our work, we found that the resolution of the animal phylogeny is dramatically affected by its “bushiness” — how closely spaced its branches are and how frequently lone branches appear. In fact, this bushiness raises concerns whether conventional molecular analyses will be sufficient to trace the evolutionary genealogy of certain groups of organisms — like animals — whose origins are several hundreds of millions of years in the past.

While DNA sequence information may not always suffice, other genome features, such as large-scale DNA rearrangements offer powerful alternatives for addressing such phylogenetic riddles. The use of these rare changes is feasible only in a genomic context but can yield remarkably precise evolutionary trees. Working in the Broad's Microbial Analysis Group, we are developing computational methods to find such rare events in genomic data and use them to explore evolutionary relationships.

The impact of genomics on the grand quest for a complete phylogenetic encyclopedia is just beginning. Of course, the fraction of species for which genome-scale data are available is truly minuscule: there are about 2 million known species of organisms and another 10,000 are discovered each year. Comparative genomics, by vastly increasing the molecular data available for a small but critical number of species, is bound to play a key role in efforts to assemble a comprehensive tree of life.

Thus, it seems that some pieces are finally falling into place — Thucydides would be proud.

Further reading:

Rokas A. (2006) Genomics and the tree of life. Science; 313:1897-1899. DOI:10.1126/science.1134490.

Rokas A, Carroll SB. (2006) Bushes in the Tree of Life. PLoS Biology; 4: e352. DOI: 10.1371/journal.pbio.0040352.

Rokas A, Krueger D, Carroll SB. (2005) Animal evolution and the molecular signature of radiations compressed in time. Science; 310:1933-1938. DOI:10.1126/science.1116759.

Rokas A, Williams BL, King N, Carroll SB. (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature; 425:798-804. DOI:10.1038/nature02053.