Exact trimming operates by looking for the last bit of sequencing vector that appears on the left-hand side of a read, just before the DNA from the organism being sequenced. If found, the vector is trimmed off. If not found, the read is passed on to the blast trimming.
insert.sites is a text file, one tab-delimited entry per line, where each line contains the following information:
- the sequencing center, should match the center field from the metainfo
- insert size
- the size of the insert in bp
- direction of sequencing, either F/R, should match the trace_end field from the metainfo
- insert site
- the roughly 10 bases of vector, linker, etc. immediately adjacent to the DNA from the organism you're sequencing; i.e. TGTGGTGGAATTC
- A switch depending on whether dimer-based sequencing is used (1) or not (0). Dimer-based sequencing is prone to occasionally having half of a dimer be blunt ended on both sides. The blunt-blunt half-dimer attaches itself before the read and causes all kinds of problems.