Broad Institute software tool helps scientists, others visualize whole genomes
Reinhard Engels and David DeCaprio
The Broad Institute has created and released for free a software tool to help researchers and students visualize and manipulate entire genomes.
Argo, the genome browser for the Calhoun annotation system, is a Java application developed by Broad colleagues for visualizing and manually annotating whole genomes. It provides an easy-to-read display of sequence and annotation tracks, customization options and an interactive zoom from megabase to nucleotide resolution.
Initially developed for Broad Institute researchers working on distinguishing genes within the human genome, the tool was released free of charge.
"Currently available browsers did not meet the visualization needs of people who work with sequence data every day, especially those people doing manual annotation," said Broad annotation team leader David DeCaprio. "Other browsers aren't as easy to use, or as powerful at visualization. We wanted to set a new standard."
The tool will be especially useful to biologists and others in bioinformatics who don't have a computer science background, DeCaprio said. The browser lets users build and edit gene models and immediately see the effects of their changes on mRNA and protein sequences.
"The genome is a simple concept — in the case of the human genome, 3 billion As, Cs, Gs, and Ts in a row — and a bunch of regions that programs have identified as interesting. The problem is, programs have identified so many overlapping regions of interest, it's difficult for biologists to make sense of them by just looking at the coordinates," said Reinhard Engels, a software engineer at the Broad.
In addition to protein features, Argo shows repeats, single nucleotide polymorphisms (SNPs), non-coding RNA, syntenic regions, and protein similarity.
"We are using Argo for the annotation of the Neurospora crassa genome. It has extensive capabilities that enable effective curation. Importantly, Argo's developers are providing this tool freely to the academic community and are responsive to the needs of its users," said Matthew Sachs, associate professor in the Department of Environment and Biomolecular Systems at Oregon Health and Science University. Neurospora crassa, or red bread mold, is widely used in genetic studies.
The browser helps users "read" the genome because the places where working sequences stop and start are flagged, and it contains footnotes on what the sequence may be responsible for in the organism. "It's like having an annotated version of Shakespeare where you see not just one interpretation of a quote, but all the prominent scholars' comments," Engels said.
Because Argo displays genomic data with a spatial perspective that makes it easy to appreciate the overall structure of the gene, it could help teach students the basics of gene structure and how the outputs of various kinds of genome-analysis software differ from one another, its developers said.
A variety of different analysis algorithms within Argo make it possible for users to rapidly compare and evaluate the effects of altering analysis parameters. Based on these displayed data, plausible models of gene structure are easily created by a bench scientist working with an unannotated stretch of DNA. The browser also helps accelerate manual annotation.
"Argo provides an extremely intuitive, user-friendly, and versatile platform for the visualization and manipulation of complex features across a genome. Its ease-of-use greatly facilitates the process of genomic investigation for the regular biologist," said Patrick Tan, M.D., who is a researcher at Singapore's National Cancer Center and the Genome Institute in Singapore.
Argo also improves data access by making it easier to automatically retrieve data from genomic websites and to load in data directly from the results produced by popular analysis programs.
"Though other genome browsers with similar feature sets exist, we believe Argo provides a more flexible and intuitive user interface," DeCaprio said.
The browser is available for download at http://www.broad.mit.edu/annotation/argo.
Additional members of the Argo team include: Sarah Calvo, Mark Borowsky, Tim Elkins, Chinnappa Kodira, Randy Milbert, Sinead O'Leary, Seth Purcell, Shunguang Wang, Charlie Whittaker, Yuhong Wu, James Galagan, and Jill Mesirov.
The Broad Institute is known officially as The Eli and Edythe L. Broad Institute. It is a research collaboration of the Massachusetts Institute of Technology, Harvard University and its hospitals and the Whitehead Institute for Biomedical Research. The Broad mission is to create comprehensive tools for genomic medicine, make them freely available to scientists worldwide and pioneer their application to understand and treat disease.