We are pleased to announce the release of our new de novo assembler suitable for large genomes up to human size. This is an early release and should be considered experimental, but is fully functioning. Download it now.
Our new assembler, called DISCOVAR de novo (experimental), uses the same cheap data that the original DISCOVAR release does: 250 base paired-end PCR-free Illumina reads. No other libraries are required. The runtime for a human genome on a 48 core, 0.5 Tb server is only 36 hours, and produces an assembly with a contig N50 of ~100 kb.
We are actively developing DISCOVAR de novo, so check back often for updates.
DISCOVAR can now be freely used without restriction in both non-academic and academic settings under the terms of our new license. We still encourage users to register with us if they find DISCOVAR useful.
We’ve just added some examples to our online demo to help you explore our DISCOVAR de novo assembly of NA12878. You can select a region of interest from the new drop down menu and it will be displayed below. Alternatively, enter the coordinates of your favorite region of the genome if you want to explore on your own.
Want a sneak preview of what we’ve been working on lately? Then check out this online demo that lets you explore a de novo human assembly produced by our new assembler DISCOVAR de novo.
Developed over the past 6 months, the new DISCOVAR de novo algorithm will be released later this summer. Unlike DISCOVAR, it can assemble large genomes de novo. It is also much faster, but still takes the same low-cost single-library input data that DISCOVAR does.
Whilst we prepare DISCOVAR de novo for release, take a look at the online demo we’ve set up. Here you can explore and visualize an assembly of the human cell line NA12878. You can enter any coordinates on the human reference sequence GRCh38, and the demo will show you the part of the assembly that aligns there. Using this tool, large structural variation events can be directly visualized, and simple SNPs appear as short bubbles.
Please check it out and let us know what you think via the forum.
We are now posting detailed instructions for generating libraries appropriate for use with DISCOVAR. Instructions for generating 250 base reads on the HiSeq 2500 will be posted as soon as we have a version that we’re sure is portable.
You can now download DISCOVAR compatible, high coverage, 250 base PCR-free reads for the trio of NA12878 (daughter), NA12891 (father) and NA12892 (mother) from the 1000 Genomes Data Coordination Center. With these data you will be able to run DISCOVAR to call variants on any region of the genome for the trio. Please be sure to read the release and publication policies governing these data.
Many people have asked if they can use their existing Illumina datasets with DISCOVAR – datasets that don’t meet the recommendations of ~60x coverage by 250 base paired reads from a ~700 bp PCR-free fragment library. We investigated and made some minor changes to the algorithm, embodied in release 46382 onwards, and it is now possible to use shorter reads from PCR libraries – with some caveats. We have successfully tested DISCOVAR on 100 base reads from a ~180 bp PCR fragment library, obtaining reasonable results but inferior to those generated from the recommended data. For more information on please see our FAQ.
We are pleased to announce that DISCOVAR is now available to download
DISCOVAR is a variant caller and genome assembler from the Broad Institute. It uses the latest low cost sequencing data, and can generate highly accurate variant calls for individual humans, or assemble small genomes de novo (with support for large genomes to follow). We expect it will be particularly valuable for understanding human Mendelian disease, but equally suited to investigating the biology of other organisms.
Find out more about
DISCOVAR, and please check out the FAQ