A read is a short sequence of DNA that has been taken directly from an organism's genome. The procedure of Whole Genome Shotgun Assembly involves collecting large libraries of reads and piecing them together to re-create a complete genome. Reads applied to Whole Genome Shotgun Assembly are called shotgun reads.

Reads are created by laboratory sequencing methods such as Sanger sequencing; the method used affects the read's length in bases, as well as its quality scores and linking. For more information, see Sequencing.


E. coli is capable of reading an insert at both ends, creating a pair of connected reads as a clone end-pair. This type of reads are called paired production reads or paired end-reads. The pair is called a read pair, the reads are partners or mates with respect to each other, and the connection between them is called a link. The length of the link depends on the length of the insert used to create the read pair. Because insert size is not known precisely, the links are reported with mean and standard deviation. The link may also be wildly incorrect due to chimerism.

Paired-production reads are very useful to Arachne. The pairing information allows Arachne to stitch together relatively distant parts of the genome; without it, Arachne would be unable to create supercontigs out of contigs or to circumnavigate and fill repeats.

In Arachne

Reads are given to Arachne as input in the form of fasta and qual files, which report reads' base sequence and quality scores, respectively. These files are in the directories DATA/fasta and DATA/qual. The XML ancillary files give ancillary information about reads, such as pairings. Some Arachne output files contain lists of reads; these files are typically given the name reads.fastb. Within modules, Arachne represents reads with basevector and qualvector objects.

Read properties

In addition to the bases and quality scores, reads contain many other properties that Arachne is interested in. Each property is provided as input, or determined in an Arachne module (generally in pre-processing). The reads' values for each property are stored in a file in RUN/work.

Property Type Filename Found by
Id Integer reads.ids Ordering in input
Name String reads.ids XML ancillary files
Pairing Multiple fields reads.pairto / reads.unpaired XML ancillary files
Lengths Integer reads.lengths TrimBadEnds2
Trimming Two integers reads.trim_lr FetchAndTrim
Repetitivity Bool reads.is_repetitive TagRepeatReads

These lists are not exhaustive.

