A contig is a contiguous sequence of bases that has been constructed by aligning reads and building consensus. Contigs are strung together, with gaps in between, to create a supercontig.


  • Like a read, a contig contains a sequence of bases with corresponding quality scores for each base. In the Arachne code package, these sequences are typically represented by basevector objects.
  • Contigs remember the locations and orientations of all reads that were used to build this contig consensus, represented as a set of ReadLocation objects.
  • Contigs in an assembly are assigned an integer id to distinguish them from one another. In I/O, contigs are often named in accordance with their id: contig id 42 is "c42". The module ReindexSupers reorganizes contig ids as follows: Starting with 0, the contigs on the largest supercontig are indexed in order; then the contigs on the second largest supercontig are indexed; and so forth.


A bare or unpopulated contig is a contig with no associated ReadLocations.


The term contig comes from contiguous, and is occasionally shortened even further, to tig. A contig's length is often denoted by the abbreviation CGLEN.

