The pre-processing phase is an early phase in the Arachne process, occurring after data verification but before the assembly process proper. During pre-processing, the input data is examined, analyzed, and processed in simple ways, designed to make the assembly process run more quickly and smoothly. Arachne implements pre-processing through a series of pre-processing modules.
Pre-processing algorithms are generally not time- or memory-intensive; those algorithms are usually shifted into the assembly phase. Algorithms used in pre-processing (and their associated modules) may include:
- Collecting the raw data (PartitionInput)
- Compactifying data in Arachne binary format (PartitionInput)
- Read trimming (FetchAndTrim, TrimBadEnds2)
- Vector trimming
- Removing short reads
- Overlap finding via read-read alignments (ReadsToAligns)
- Reordering reads for optimized runtime (ReorderReads)
- Error correction (CorrectErrors, LocateChimeras)
- Repeat finding (TagRepeatReads)
- Finding read lengths (TrimBadEnds2)
The input data for pre-processing is found in DATA and its subdirectories fasta, qual, and traceinfo. The output from pre-processing modules is put in RUN. Note that some pre-processing modules edit the files in RUN in place.