Finished bacterial genomes from shotgun sequence data using ALLPATHS-LG

Our manuscript "Finished bacterial genomes from shotgun sequence data" was just posted as an accepted preprint in Genome Research.  "By applying a new laboratory design and new assembly algorithm to sixteen samples, we demonstrate that assemblies exceeding finished quality can

Multi-threading improved

As of revision 42305, ALLPATHS-LG now sets the environment variable MALLOC_PER_THREAD to 1, which causes each thread to do its own memory allocation. This results in a several percent speed-up, and in some cases, a much greater improvement. We anticipate

Improvements to ALLPATHS-LG hybrid assembly algorithm

Release 41343 includes many changes to the hybrid (Illumina + PacBio) assembly algorithm in ALLPATHS-LG, which result in substantially improved assemblies of bacterial genomes. This completes our work on this problem for now, however we will make changes as needed

A test for your system

We've started a testing page, which currently has a single speed test which we have used as a 'sanity check' for the health of a machine. This is important, and is the first thing to try if ALLPATHS-LG seems to

ALLPATHS-LG available on Blacklight at PSC

The Pittsburgh Supercomputing Center has installed and tested ALLPATHS-LG on Blacklight, a gigantic shared-memory computer. This could be particularly valuable for small labs that would otherwise not have access to a large-memory machine. We thank Philip Blood at PSC and

Accuracy of ALLPATHS-LG on small genomes further improved

As of revision 38293, we've made a series of accuracy improvements to ALLPATHS-LG. Some of these improvements are specific to hybrid (Illumina + PacBio) datasets. Others are not, but are currently invoked only if the assembly size is under 10

ALLPATHS-LG hybrid assembly accuracy improved

As of revision 38088, for ALLPATHS-LG assembly of ‘hybrid’ Illumina/PacBio datasets, we’ve added a new step that uses the Illumina data to ‘clean’ PacBio patches. This step removes nearly all of the errors introduced by the patches.

ALLPATHS-LG much faster with Transparent Huge Pages

Starting with version 2.6.38, the Linux kernel allows for Transparent Huge Pages (THP), a mechanism whereby memory pages of size 2MB are used in 'situations where they would be useful', rather than the standard 4KB pages. We have found that

Stringency of long read patches increased

ALLPATHS-LG will patch gaps in scaffolds using long reads from PacBio. As of revision 37660 we have tightened up the join process so that long reads having longer overlaps with their neighboring contigs are given more weight. This leads to

PacBio/Illumina gap patching in ALLPATHS-LG now faster

As of revision 37655, the algorithm that computes the consensus of PacBio reads that patch a gap between 'Illumina' contigs is now ~100 times faster. For bacterial genomes it was taking several hours. The speedup was obtained by replacing an

