Using shorter Illumina reads

Many people have asked if they can use their existing Illumina datasets with DISCOVAR – datasets that don’t meet the recommendations of ~60x coverage by 250 base paired reads from a ~450 bp PCR-free fragment library. We investigated and made some minor changes to the algorithm, embodied in release 46382 onwards, and it is now possible to use shorter reads from PCR libraries – with some caveats. We have successfully tested DISCOVAR on 100 base reads from a ~180 bp PCR fragment library, obtaining reasonable results but inferior to those generated from the recommended data. For more information on please see our FAQ.

Choosing the right hardware

Would you like to help us benchmark servers?

We are contemplating server purchases and would like to get the most bang for our buck. We imagine that some of you are in the same situation. Therefore, to share intelligence, we are creating a table that shows DISCOVAR performance stats, along with server configuration information. Please take a look at the current benchmark table, which we will continue to update as we get more results. Better yet – why not participate by benchmarking your systems and sharing the results with us.

Assessing assembly methods

For purposes of assessing our assemblies and variant calls, we generated some NA12878 clone reference sequences.  We believe that these data will be of interest to the community and have therefore decided to make them available to all. These clone sequences and the raw data used to generate them can be found on our FTP site.

The sequences were obtained by randomly selecting ~100 clones from an NA12878 Fosmid library.  Two pools of ~50 each were created, then sequenced by MiSeq (250 bases) and PacBio (~3000 bases).  There are also some jumps.

We completely assembled 103 clones, without ambiguity, in some cases with manual intervention.  Cloning vector has been removed.  There are a small number of additional clones in the pools, not included in the assemblies, including a few that had low coverage, some EBV, and some centromeric sequence.

This is version 1.0 of the set.  We believe that the error rate on the clones is very low, however we are carrying out laboratory validation and will roll out updated versions as the results come back.

This work is supported by NHGRI grants.

New DISCOVAR release

A new release (r46399) of DISCOVAR is now available. It contains the following changes:

- More robust SAMtools version checking.
- Improvements to .variant file format.
- MALLOC_PER_THREAD = 1 environment setting no longer mandatory. However setting this may give a significant performance boost.
- Various bug fixes.

Thanks to all the users who have brought these problems to our attention.

DISCOVAR has been released!

We are pleased to announce that DISCOVAR is now available to download .

DISCOVAR is a variant caller and genome assembler from the Broad Institute. It uses the latest low cost sequencing data, and can generate highly accurate variant calls for individual humans, or assemble small genomes de novo (with support for large genomes to follow). We expect it will be particularly valuable for understanding human Mendelian disease, but equally suited to investigating the biology of other organisms.

Find out more about DISCOVAR, and please check out the FAQ and help too.