Genome Assembly Services

Input Requirement(s)
Illumina or Pacific Biosciences sequencing data (generated at Broad)

Batch Size(s):  4 or 8 for PacBio HGAP assembly, 4 for Eukaryotic ALLPATHS-LG assembly, 24 or 96 for Prokaryotic ALLPATHS-LG assembly

Description
We provide whole genome assemblies of genomes of virtually any size or complexity, from either Illumina or Pacific Biosciences (PacBio) data. At present, we assemble PacBio data using the HGAP algorithm and Illumina data using ALLPATHS-LG. Both algorithms produce consensus sequence with very high base accuracy (99.9%+) and contiguity high enough for gene calling. To generate the highest level of contiguity ideal for near-finished reference genomes and structural variation detection, we recommend PacBio data using long insert (>10kb) libraries assembled with HGAP. Illumina-based genomes are a more cost-effective solution providing high quality draft assemblies with good contiguity. For these we generate two libraries (fragment and jumping) and perform assemblies with ALLPATHS-LG to generate maximally contiguous genomes. Although Illumina-based ALLPATHS-LG assemblies can be performed using fragment libraries only (without accompanying jumping library data), this will necessarily generate a less-contiguous assembly, which is still suitable for gene calling. For prokaryotic genomes all assemblies include automated de novo gene annotation using the Prodigal algorithm. Please note that for any of our Genome Assembly Services, all sequence data must be generated at the Broad Institute.