- For the input directory, see e_coli.
Escherichia coli (E. coli) is a bacteria species found in the human gut, and an important model organism in biology and genomics. The E. coli genome is known to a very high degree of certainty, in part because it is short (4.5 Mbp) and well-behaved (no serious repetitivity or polymorphism.) Trial runs of new sequencing methods and assembly processes often use E. coli as a test case.
The E. coli bacterium plays a crucial role in Sanger sequencing, as follows. An E. coli cell absorbs an insert into its genome and then reproduces, allowing the insert to be read many times over and turned into reads. This process runs into trouble if the insert is highly AT-rich or GC-rich, leading to cloning bias. Also note that inserts are occasionally mis-cut, causing the E. coli genome to be incorporated into inserts and reads. To solve this problem, the e_coli and e_coli_transposons subdirectories of PRE provide the E. coli genome so that it can be removed from input reads.