Contig accuracy twice as good now

Over the last two months, since we wrote the ALLPATHS-LG paper, the accuracy of contigs generated by the algorithm has been substantially improved.  Here’s a table of  results for a test assembly consisting of the mouse chr 1 region, from 10 to 30 Mb:

date revision base errors
(per 104)
rate (%)
bases (%)
11-1-2010 35038 1.74 2.69 0.050
1-7-2011 35511 0.79 1.41 0.078

This is based on an analysis of ~1 kb chunks, as in the paper.  It’s an all-in analysis that captures all errors in contigs, which are categorized either as base errors or misassemblies.  Some errors are eliminated in the assembly by encoding that we don’t know the exact answer, thus for example, …CCTAAAAAAAAAA{,A,AA}GTC… has a run of between 10 and 12 As, and this gets counted as right if the true answer is one of those. However, there is no free lunch: we also count these ‘ambiguous’ bases, as shown in the last column of the table. The total of these has increased, but is still under 1 per 1000, and most are concentrated in simple sequence repeats.

This entry was posted in Release. Bookmark the permalink.

Comments are closed.