Fragment library size distribution plots

Each DISCOVAR de novo assembly will now come with a plot like this

frags.dist

showing the observed size distribution for the fragments defined by the input read pairs, and in the file frags.dist.png. These plots can be highly diagnostic. They are available from revision 51298 onwards. The raw data are in the file frags.dist

Clarification of DISCOVAR input requirements

DISCOVAR takes as input read pairs from fragments of size 400-500 bp, with some larger and some smaller. The blog and manual contained references to fragments of size 700 bp, which were outdated, and have now been removed. Note that the protocol yields a wide size distribution, including some large fragments.

Cleaner assemblies

Revision 46631 contains a number of algorithmic improvements. In particular there are now less ‘false’ bubbles in assemblies.  These arise particularly in assemblies at high (> 60x) coverage, as one might have e.g. for a bacterium.  These bubbles have substantial support on both branches but examination of the quality score distributions of the read bases associated with both allows DISCOVAR to kill off one branch.