- For the assembly module, see TagRepeats.
TagRepeatReads is a pre-processing module that flags reads that appear to be repetitive. It uses kmers to determine repetitiveness, according to the following two steps:
- Compile a list of repetitive kmers. A kmer is deemed to be repetitive if its frequency among the set of reads is at least repeat_thres times the median frequency of all non-unique kmers in that set. The default value of repeat_thres is 2.5.
- Tag all reads containing a repetitive kmer as repetitive. Note that "repetitiveness", in the context of a read, means containing kmer sequences that appear often in other reads, rather than containing the same sequence many times within the same read.
The most important output of TagRepeatReads is the file reads.is_repetitive, which is used by TagRepeats and many other modules.