A slice of Satsuma

By Haley Bridger

November 29, 2011

Before arriving at a conference in Santa Cruz last year, Broad researcher Federica Di Palma had not realized that the computational tool developed by others at the Broad and relished by her research group had such a following outside of the institute. Federica and her fellow members of the Broad’s Vertebrate Biology Group had been among the first scientists to put the alignment tool – known as Satsuma – to use, but they were certainly not the only ones. At the conference in Santa Cruz for the Genome 10K Project (an effort to sequence the genomes of 10,000 animals), researchers sequencing and comparing a variety of organisms came up to Federica and fellow researcher Kerstin Lindblad-Toh to introduce themselves as fans of Satsuma.

“People are generating sequences for model species that are very unconventional and they want to be able to compare them to previously sequenced species,” says Federica. “They’re using Satsuma because it’s fast, it’s sensitive, and it can tackle portions of the genome that before you weren’t able to see.”

Satsuma was developed by Manfred Grabherr, a former member of the Broad’s Vertebrate Biology Group, in response to a tricky problem that scientists encounter when comparing genomes. At the time, Manfred and his colleagues were trying to compare the newly sequenced lizard genome to – well, just about anything. The lizard’s closest sequenced relative is the chicken, but the two species parted ways 250 million years ago. The computational tools Manfred was trying to use were not up to the task of comparing them.

Lining up genomes of closely related organisms – say, human and chimp – is relatively simple. Their genomes have a lot in common because there’s been relatively little time since the species split for changes to accumulate in their respective genomes. But when comparing two distantly related organisms, most software tools don’t know where to begin. Conventional tools look for an identical stretch of DNA called a “seed” (for example, a string of the same 12 base pairs like ACTGAGCCTCCG in both genomes), but the more millennia that separate two species, the more genetic differences have cropped up so that the chances of having a common seed are vanishingly small.

In the midst of lizard-genome-induced frustration, Manfred decided he needed to take a new tack. If the conventional tools couldn’t align his genomes, he and his colleagues would create a tool that could.

“We said let’s turn this completely around,” Manfred recalls. “Let’s not make any assumptions. Let’s not do what other people have done. We want to do it differently.”

Manfred and his colleagues threw out the idea of starting with a seed. They turned to an entirely different field for inspiration: audio processing. Before joining the Broad as a computational biologist, Manfred worked on voice recognition software (think of the predecessors to Siri). Different accents and pronunciations pose the same kinds of problems to audio processors as the changes found throughout the genomes of distantly related organisms do to alignment tools.

“It’s a very similar problem,” Manfred says. “If you think about the way people talk, everyone pronounces words differently but you can still recognize them as such. But a computer has a much harder time doing that.”

Instead of beginning with a seed, Satsuma begins comparing genomes the way that children begin a game of Battleship. It compares single letters until it finds a “hit” – something similar across genomes. It then searches the neighborhood around the hit, extending and extending. Satsuma tackles different segments of the genome in parallel, speeding up its search.

Research scientist Jessica Alföldi had just started working at the Broad when Manfred was developing the program that would become Satsuma. When Manfred explained that unlike other programs, his did not use seeds, Jessica suggested he name it after the first naturally seedless fruit: Satsuma. The name stuck.

Satsuma worked fast and effectively to line up the lizard and chicken genomes, and then the lizard and human genomes, lizard and opossum genomes, and more, helping researchers better understand elements of each genome and the evolutionary connections among species. Other researchers are also using Satsuma to compare the partially sequenced genome of one species (for example, ferret) to the fully sequenced genome of a not-too-distantly related species (for example, dog).

Jessica and Federica are currently using Satsuma to investigate a very unusual animal called the coelacanth, which Jessica describes as the closest thing to the first fish to come up on land. Coelacanths are lobe-finned fish and diverged from land vertebrates about 400 millions years ago. They were thought to have been extinct for 70 million years, but were discovered in 1938 off of South Africa. Despite the huge evolutionary distance, the Vertebrate Biology Group has been able to compare the coelacanth (which is still around today) to more familiar four-legged animals. Federica says that without Satsuma, this work and many other genomic comparisons would not have been possible.

“We don’t have a mandate to develop these tools, but we do so when there is nothing else,” she says. “How else would you be able to look at genomes that are 250 million years apart or 400 million years apart if you don’t have the right tool? It also empowers the community to do so much more – communities that wouldn’t have been able to answer questions that are relevant to their model organisms.”

In addition to Jessica, Federica, and Manfred, others in the Vertebrate Biology group were involved in Satsuma’s origin and evolution, including Pamela Russell who helped with math and evaluations and Miriah Meyer who designed a software for visually comparing genomes (see more about MizBee here). Satsuma and other tools developed by the Vertebrate Biology Group are freely available here.