We think that the ideal way to develop and test genome assembly methodology (both laboratory and computational) is to gather a set of ‘control’ genomes that can be sequenced and assembled over and over as the technologies improve.
For this approach there are two requirements: (1) a renewable source of DNA and (2) a very high quality reference sequence, based on the same DNA sample.
Currently we have five genomes that match these requirements reasonably well, and which we have sequenced (and deposited in the Short Read Archive):
(1) human cell line GM12878
(2) mouse strain C57BL/6J
(3) E. coli K12 MG1655
(4) Rhodobacter sphaeroides 2.4.1 (high GC bacterium)
(5) Plasmodium falciparum 3D7 (very low GC).
It would be valuable to diversify and expand this collection!