An assembly is a collection of DNA sequences that approximately represents a complete genome. Producing a high-quality assembly from shotgun reads is the goal of Whole Genome Shotgun Assembly and the task of the Arachne program. The action of creating an assembly is called the assembly process, though it is sometimes itself referred to as "the assembly".

Assemblies in Arachne

An Arachne assembly is comprised of independent supercontigs, each of which contains a series of contigs separated by gaps. The contigs are generated by consensus from the input reads, and the supercontigs are created using read pairing information. An ideal assembly has high read usage, good connectivity, and insert happiness, and few base disagreements between reads. The input for such an assembly would consist of a large number of reads (for good coverage) with high quality scores.

The Arachne modules that perform the assembly process are called assembly modules. The central assembly module is Assemblez, a script module that contains a pipeline of other modules.

Draft assembly

The output of Assemblez is a draft assembly, so called because it is understood to be an imperfect representation of a genome due to the inherent limitations of the assembly process. The following assembly files constitute the necessary information in a draft assembly for output purposes. (For a complete list, see Output.)

Further use of a draft assembly

Once an assembly has been sequenced, it can be used as a reference in another assembly process; this is called assisted assembly.

