- For the pre-processing module, see TagRepeatReads.
TagRepeats is a repeat finding assembly module. It effectively performs self-alignment on an assembly, finding repetitive kmers in the assembly contig consensus and merges repeats into longer intervals. TagRepeats is run as part of the script module RunMarkup.
Algorithm and Output
First, a sorted table is constructed from overlapping kmers in contig consensus. Then, for each kmer in each contig, it searches for multiple matches in the table (forward and reverse complement). The raw information about kmer frequency is stored in the files mergedcontigs.tags and repeats.ano in SUBDIR. If consecutive kmers have consecutive matches in another contig, they are merged into "matches", which are defined as alignments longer than a kmer (and longer than the parameter POST_MATCH_MIN) with identities of 99% and up over the length of the repeat. Matches are stored in the files repeats.db and match.ano. The latter file is annotated with the location of the other copies in the genome by contig and scaffold; it is also loaded by DisplaySupercontig by default, so that repeats can be displayed as colored bars under contigs).
Modules that require TagRepeats output
TagRepeats can also be used to compare assemblies. When using the parameters EXTERNAL_DATA, EXTERNAL_RUN and EXTERNAL_SUBDIR, the assembly is aligned to an external assembly at /PRE/EXTERNAL_DATA/EXTERNAL_RUN/EXTERNAL_SUBDIR.