Assembly and Finishing

Methodology Overview

Whole Genome Shotgun Sequencing and Assembly

The Mycobacterium tuberculosis genomes were sequenced using the Whole Genome Shotgun methodology, whereby:
  1. Mycobacterium tuberculosis DNA is randomly shattered into small fragments.
  2. Narrow range fragments around 4 Kb, 10 Kb are inserted into vectors to create 4 Kb & 10 Kb plasmids.
  3. The two ends of the fragment are sequenced, creating paired reads
  4. The assembly process uses the paired reads to identify contiguous stretches of sequence (contigs)
  5. Contigs are ordered and linked together into larger supercontigs by using paired reads lying in different contigs
assembly diagram


SupercontigsContigsCoverageContig NC50Supercontig NC50
M. tuberculosis C4160 (4,276,200 bp)6.7x(5.7x,Q>20)47.4 Kb3.0 Mb
M. tuberculosis F11140 (4,405,268 bp)22.27X182.2 Kb4.4 Mb
M. tuberculosis Haarlem865 (4,347,292 bp)14.5x(11.15X,Q>20)111.6 kb2.46 Mb

F11 Genome Finishing

The Mycobacterium tuberculosis F11 genome has been finished to high quality. The sequence is contiguous with no gaps or N's. All sequence gaps and low quality areas have been addressed with finishing reads. All consensus bases are represented by phred 30 quality data for at least one read and any areas lacking double strand or multiclone coverage are annotated. The finished F11 genome sequence has passed internal QC procedures including review of all joins and manual edits.

Supercontig/Contig Numbering

  • Supercontig and contig numbers are preceded by the version of the assembly. For example:
    • Contig 1.2 - refers to contig number 2 within assembly 1.
    • Supercontig 1.1 - refers to supercontig number 1 within assembly 1.
  • Supercontigs are numbered in order of decreasing length. See the Assembly Structure page for a list of all supercontigs with their lengths and contained contigs.
  • Contigs within supercontigs are ordered positionally within a supercontig. See the Assembly Structure page for a list of all supercontigs with their lengths and contained contigs.

    There is no correspondence between contig or supercontigs numbers in different assemblies.

Library clones

Mycobacterium tuberculosis C

Type clone ends generated clone ends mapped to assembly
Plasmid 50,688 39,696

Mycobacterium tuberculosis F11

Type clone ends generated clone ends mapped to assembly
Plasmid 91,941 86,882
Fosmid Library 100,896 41,580
Totals 192,837 128,432

Mycobacterium tuberculosis Haarlem

Type clone ends generated clone ends mapped to assembly
Plasmid 103,680 94,212