Computational Research and Development
Welcome to the Computational Research and Development Group (CRD) homepage. We are part of the Genome Sequencing and Analysis Program at the Broad Institute. This site presents research tools we have developed for working with genome sequence data, and which we provide support for. If you have any trouble using them, please check our general help first, then write to us at crdhelp@broadinstititute.org.
These tools are the computational part of the solution to two general problems, how to find the sequence of a genome, and how to find the differences between the sequences of two genomes.
Do you want the sequence of a genome?
We have been sequencing and assembling genomes since before 2000. Our goal is to produce high-quality draft sequences at low cost. To do this, we look for the optimal deployment of cutting-edge sequencing technologies and assembly algorithms. Here we describe the two laboratory and computational strategies that we are presently pursuing, and then the software packages for assembly that we have developed.
METHODS FOR VERY SHORT READS.
We have developed and tested a method for assembling very short (~30 base) paired reads using the ALLPATHS algorithm. This method requires high coverage from two libraries, one from fragments of size 3-4 kb, and one from shorter fragments.
METHODS FOR SHORT READS.
Because very short reads may be inadequate for large genomes, and because next generation sequencing technologies are starting to produce longer reads, we are exploring a strategy that applies the ALLPATHS algorithm to the following data:
- 100 base reads from 180 bp ± 10% fragments, 45x coverage
- 100 base reads from ~3000 bp ± 10% fragments, 45x coverage
- additional sequence from longer fragments for large genomes.
Such data are starting to become available on the Illumina platform. At 90x total coverage, the cost is roughly six-fold lower than the best available alternative, 15x coverage on the 454 platform. However, costs may change rapidly and we will update the strategy accordingly.
Do you want the differences between genomes?
We have developed three programs for finding the differences between genomes:
General Guidelines
General guidelines on system and software requirements, and instructions for building our software may be found here.