Assessing assembly methods

For purposes of assessing our assemblies and variant calls, we generated some NA12878 clone reference sequences.  We believe that these data will be of interest to the community and have therefore decided to make them available to all. These clone sequences and the raw data used to generate them can be found on our FTP site.

The sequences were obtained by randomly selecting ~100 clones from an NA12878 Fosmid library.  Two pools of ~50 each were created, then sequenced by MiSeq (250 bases) and PacBio (~3000 bases).  There are also some jumps.

We completely assembled 103 clones, without ambiguity, in some cases with manual intervention.  Cloning vector has been removed.  There are a small number of additional clones in the pools, not included in the assemblies, including a few that had low coverage, some EBV, and some centromeric sequence.

This is version 1.0 of the set.  We believe that the error rate on the clones is very low, however we are carrying out laboratory validation and will roll out updated versions as the results come back.

This work is supported by NHGRI grants.