The ReadsToAligns modules (ReadsToAligns1, ReadsToAligns2, and ReadsToAligns3) are three pre-processing modules that generate read-read alignments, thereby performing the overlap phase of the assembly process. They divide the work between them as follows:
- ReadsToAligns1: Considers read pairs and attempts to align them. This is time-intensive, although it contains several heuristics to avoid examining every read pairwise -- most notably dividing the reads into piles and examining only read pairs within a single pile. Uses the MakeAligns module and creates the aligns.pile files. In addition, since the number of reads can be very large, a pre-selection step searches for perfect kmer matches between all reads and uses these to perform real read-to-read alignments. Repetitive kmers (as marked by TagRepeatReads) are not used for this seeding, because they would result in too many kmer matches; note that this prevents repetitive regions of the genome from being mapped out.
- ReadstoAligns2: Refines read-read alignments by adding in reverse complements, remediating, and removing bad alignments. Reads the aligns.pile files and creates aligns.total2.
- ReadsToAligns3: Cleans up multiple alignments. Modifies aligns.total2 in place.
In module pipelines, the ReadsToAligns modules should always be run together. It is a good idea to follow up with additional alignment-pruning modules, such as EraseImproperAligns, TidyAligns, and CleanAlignments.