Our manuscript “Comprehensive variation discovery in single human genomes” is now available as an advance online publication from Nature Genetics. This paper describes our assembly and variant calling algorithm DISCOVAR, which is able to find many novel variants missing from standard call sets. DISCOVAR is available for download now, and we encourage you to try it out. For de novo assembly without variant calling, see our other package: DISCOVAR de novo.
You can now limit the maximum number of threads DISCOVAR de novo uses with the new option
NUM_THREADS (release 51183). This is useful if you have to share your hardware, or if your system admin has limited the number of threads a single process can use. It can also be a good idea to restrict the number of threads if your hardware has many cores (>50), as the parallelization efficiency can start to drop due to locking and cache coherency issues.
We have fixed the bug in DISCOVAR de novo cited in the last blog message. Please download and use the new version (50964) from our ftp site.
The latest release (50893) of DISCOVAR de novo now supports BAM files directly, and no longer requires SAMtools to be installed. This change has the added benefit of halving the time required to import data from a BAM, potentially saving hours on a human sized genome. Note that the original variant calling version of DISCOVAR still requires SAMtools in order to work.
We’ve made some minor changes to DISCOVAR de novo output files to make them easier to use. See the manual or the Edge, lines and scaffolds primer for more details.
The assembly graph can be large, complex and unwieldy, so DISCOVAR de novo does not generate a viewable graph directly. Instead we have developed an interactive tool that allows you to explore your assembly by creating smaller viewable graphs of the regions you are interested in. This new tool, called NhoodInfo, is now part of the DISCOVAR package, as of release 50612 . It is also the engine behind our online demo, so you can try it out right now without having to create an assembly of your own. Full instructions on using NhoodInfo are included in the DISCOVAR package.
A DISCOVAR de novo assembly is a graph. A typical assembly consists almost entirely of linear stretches, typically like this
which we call ‘lines’, and providing a rich data type that captures polymorphism and other important features. Further, with some loss of information, these lines may be ‘flattened’ into standard contigs. We have added a tutorial explaining how these data types are available as part of the DISCOVAR output. We are also interested in hearing your thoughts regarding the utility of these output types and others that might be useful to you.
We are pleased to announce the release of our new de novo assembler suitable for large genomes up to human size. This is an early release and should be considered experimental, but is fully functioning. Download it now.
Our new assembler, called DISCOVAR de novo (experimental), uses the same cheap data that the original DISCOVAR release does: 250 base paired-end PCR-free Illumina reads. No other libraries are required. The runtime for a human genome on a 48 core, 0.5 Tb server is only 36 hours, and produces an assembly with a contig N50 of ~100 kb.
We are actively developing DISCOVAR de novo, so check back often for updates.
DISCOVAR can now be freely used without restriction in both non-academic and academic settings under the terms of our new license. We still encourage users to register with us if they find DISCOVAR useful.