DISCOVAR now generates variant lists using the Variant Calling Format (VCF). This is the standard used by the community and is supported by many tools. Whilst the VCF file contains all events found by DISCOVAR, the complementary .variant file may contain additional information not easily represented in the VCF format. We encourage our users to look at both. The VCF should be filtered prior to use, and we have provided a tool and instructions on how to do this.
To facilitate calling variants using DISCOVAR on large genomes, we have created a tool to merge VCF files generated for overlapping regions. Simply run DISCOVAR on each region in turn (or in parallel to speed things up), then merge the VCF files that are produced. We currently recommend using a 50 kb region size, with a 10 kb overlap.
For more information on the VCF output, filtering and merging, please refer to our manual.
Many people have asked if they can use their existing Illumina datasets with DISCOVAR – datasets that don’t meet the recommendations of ~60x coverage by 250 base paired reads from a ~700 bp PCR-free fragment library. We investigated and made some minor changes to the algorithm, embodied in release 46382 onwards, and it is now possible to use shorter reads from PCR libraries – with some caveats. We have successfully tested DISCOVAR on 100 base reads from a ~180 bp PCR fragment library, obtaining reasonable results but inferior to those generated from the recommended data. For more information on please see our FAQ.
A new release (r46399) of DISCOVAR is now available. It contains the following changes:
- More robust SAMtools version checking.
- Improvements to .variant file format.
- MALLOC_PER_THREAD = 1 environment setting no longer mandatory. However setting this may give a significant performance boost.
- Various bug fixes.
Thanks to all the users who have brought these problems to our attention.
We are pleased to announce that DISCOVAR is now available to download
DISCOVAR is a variant caller and genome assembler from the Broad Institute. It uses the latest low cost sequencing data, and can generate highly accurate variant calls for individual humans, or assemble small genomes de novo (with support for large genomes to follow). We expect it will be particularly valuable for understanding human Mendelian disease, but equally suited to investigating the biology of other organisms.
Find out more about
DISCOVAR, and please check out the FAQ