Sequencing & Assembly

This genome was sequenced using 454 and Sanger Whole Genome Shotgun methodology:

Sanger Sequencing

  1. DNA is shattered into 40 kb fragments
  2. Each fragment is inserted into a vector and cloned
  3. The two ends of the fragment are sequenced, creating paired reads

454 Sequencing

  1. DNA is shattered into small fragments (~0.6kb or ~3kb)
  2. 0.6kb fragments are tailed with 454 sequencing adapters
  3. 3kb fragments are circularized on a biotinylated linker, circles are sheared, fragments containing biotinylated linker are retrieved and tailed with 454 sequencing adapters.
  4. Adapterized fragments are sequenced from one end, creating fragment or paired reads.

Arachne Assembly

  1. The assembly process uses the paired reads to identify contiguous stretches of sequence (contigs)
  2. Contigs are ordered and linked together into larger supercontigs by using paired reads lying in different contigs

For more info on the arachne assembler see: http://www.broad.mit.edu/wga/

Supercontig/Contig Numbering

  • Supercontig and contig numbers are preceded by the version of the assembly. For example:
    • Contig 1.25 - refers to contig number 25 within assembly 1.
    • Supercontig 1.2 - refers to supercontig number 2 within assembly 1.
  • Supercontigs are numbered in order of decreasing length. For example, supercontig 1.1 is the largest, and supercontig 1.2 is the next largest.
  • Contigs within supercontigs are ordered positionally. For example, supercontig 1.1 contains contigs 1,2,3... (in that order).

There is no correspondence between contig or supercontigs numbers in different assemblies.