Choosing the right hardware

Would you like to help us benchmark servers?

We are contemplating server purchases and would like to get the most bang for our buck. We imagine that some of you are in the same situation. Therefore, to share intelligence, we are creating a table that shows performance stats, along with server configuration information. We are using our new tool DISCOVAR as the basis for this test, but the results should still be of interest to ALLPATHS-LG users.

Please take a look at the current benchmark table, which we will continue to update as we get more results. Better yet – why not participate by benchmarking your systems and sharing the results with us.

Posted in Misc | Leave a comment

DISCOVAR is here!

The ALLPATHS-LG team have been busy lately working on a new project:

DISCOVAR is both a genome assembler and a variant caller. It requires only a single Illumina fragment library to run, leading to cheaper genome assemblies and low cost variant calls. Currently it can assemble small genomes, but we are working hard to add support for large genomes too. However, it can be used as a highly accurate variant caller on any size of genome – making it particularly valuable for understanding human Mendelian diseases. Find out more on the DISCOVAR blog.

DISCOVAR does not replace ALLPATHS-LG, and indeed DISCOVAR is presently unable to assemble large genomes.

Posted in News | Leave a comment

Mouse 129 assembly released

The ALLPATHS-LG assembly of Mouse 129 (Mus musculus strain 129Sv/Jae) is now available via NCBI and the Broad Institute ftp site. The basic statistics and location are given below:

Contig N50 (kb) Scaffold N50 (Mb) NCBI Accession # Broad FTP
15.3 6.9 AHBB00000000 mm129svJae1.0

Please read our release terms and conditions.

Posted in Assemblies | Leave a comment

Golden Hamster assembly released

The ALLPATHS-LG assembly of Golden Hamster (Mesocricetus auratus) is now available via NCBI and the Broad Institute ftp site. The basic statistics and location are given below:

Contig N50 (kb) Scaffold N50 (Mb) NCBI Accession # Broad FTP
22.5 12.8 APMT00000000 MesAur1.0

Please read our release terms and conditions.

Posted in Assemblies | Leave a comment

Weddell Seal assembly released

The ALLPATHS-LG assembly of Weddell Seal (Leptonychotes weddellii) is now available via NCBI and the Broad Institute ftp site. The basic statistics and location are given below:

Contig N50 (kb) Scaffold N50 (Mb) NCBI Accession # Broad FTP
23.7 0.9 APMU00000000 LepWed1.0

Please read our release terms and conditions.

Posted in Assemblies | Leave a comment

Assessing assembly methods

For purposes of assessing our assembly methods, we generated some NA12878 clone reference sequences.  We believe that these data will be of interest to the community and have therefore decided to make them available to all. These clone sequences and the raw data used to generate them can be found on our FTP site.

The sequences were obtained by randomly selecting ~100 clones from an NA12878 Fosmid library.  Two pools of ~50 each were created, then sequenced by MiSeq (250 bases) and PacBio (~3000 bases).  There are also some jumps.

We completely assembled 103 clones, without ambiguity, in some cases with manual intervention.  Cloning vector has been removed.  There are a small number of additional clones in the pools, not included in the assemblies, including a few that had low coverage, some EBV, and some centromeric sequence.

This is version 1.0 of the set.  We believe that the error rate on the clones is very low, however we are carrying out laboratory validation and will roll out updated versions as the results come back.

This work is supported by NHGRI grants.

Posted in Misc, News | Leave a comment

GCC 4.7.0 or higher now required to build ALLPATHS-LG

As of release 44849, GCC 4.7.0 (or higher) is now required to build ALLPATHS-LG.

We have made this transition in order to benefit from the many exciting new features afforded by the C++11 standard. If you are unable to access the latest versions of the compiler at this time, please continue to use earlier releases of ALLPATHS-LG which still support GCC 4.4.0 or higher.

Posted in News, Release | Leave a comment

FASTG specification released

The FASTG Format Specification Working Group is pleased to announce version 1.0 of the FASTG specification

FASTG is a format for faithfully representing genome assemblies in the face of allelic polymorphism and assembly uncertainty. Currently genome assemblies are represented linearly, as sequences of bases, recorded in FASTA files. Since chromosomes are in fact linear or circular, this makes sense, so long as one has complete knowledge of the genome. However, many genomes contain polymorphisms that cannot be represented in a simple linear sequence, and almost all assemblies contain errors and omissions, which can result in incorrect biological inferences. The FASTG format aims to address this problem using a flexible graph-based approach to encode any variability in the sequence, along with metadata to score and annotate the source of those variations. Assembly graphs in FASTG can be easily translated into linear FASTA sequences to support current analysis tools for reading mapping, annotation, visualization, etc, but our hope is to develop a next generation of assembly and genome analysis algorithms that can work with the graph structure directly. For the complete specification and additional information on FASTG, please visit:

http://fastg.sourceforge.net

If you are interested to discuss this further, please subscribe to the assemblathon-file-format mailing list:

http://assemblathon.org/pages/mailing-list

The immediate plans are to enlist help to develop a reference library and command line suite for parsing, transforming, and querying assemblies in FASTG format, similar to the widely used SAM/SAMTools suite.

Posted in News | Leave a comment

Prairie Vole assembly released

The ALLPATHS-LG assembly of Prairie Vole (Microtus ochrogaster) is now available via NCBI and the Broad Institute ftp site. The basic statistics and location are given below:

Contig N50 (kb) Scaffold N50 (Mb) NCBI Accession # Broad FTP
21.2 61.8 AHZW00000000 MicOch1.0

Please read our release terms and conditions.

Posted in Assemblies | Leave a comment

Lesser Hedgehog Tenrec assembly released

The ALLPATHS-LG assembly of the Lesser Hedgehog Tenrec (Echinops telfairi) is now available via NCBI and the Broad Institute ftp site. The basic statistics and location are given below:

Contig N50 (kb) Scaffold N50 (Mb) NCBI Accession # Broad FTP
20.4 45.8 AAIY00000000 EchTel2.0

Please read our release terms and conditions.

Posted in Assemblies | Leave a comment