For the laboratory procedure, see Sequencing.

In genomics, a sequence is a string of nucleobases on a strand of DNA or RNA. DNA sequences contain four different bases - adenine, cytosine, guanine, and thymine - commonly represented as A, C, G, and T, respectively. Hence any known DNA sequence can be represented as a string of these four letters: TGGACTTGAA...

An important property of any sequence is its length, measured in base pairs. Sanger reads are around 700 base pairs, while Solexa reads are typically about 35. The human genome - which actually contains a separate sequence for each chromosome - is 3 billion base pairs.

Sequences are the building blocks of all larger assembly objects. Kmers, reads, contigs, and even entire chromosomes are essentially sequences with auxiliary information. One of the central problems of computational biology is sequence alignment, which produces alignments.

In Arachne

Inside the Arachne code package, sequences are as basevector objects. Arachne file formats for sequences include fasta and fastb.

