QueryLookupTable or QLT is a fast heuristic aligner that aligns short sequences (queries) within long ones (reference contigs), by means of a lookup table. A similar module is ShortQueryLookup, which performs the same function but is optimized for shorter reads, such as Solexa reads.
QueryLookupTable is commonly used in post-processing, to compare a test assembly against a known genome that the assembly should resemble. Evaluating the resulting nobbits provides a metric of the quality of the new assembly, as reported in assembly.ps.
QueryLookupTable runs several algorithmic steps; each step is controlled by several command-line arguments. QueryLookupTable has an enormous number of command-line arguments, and unlike in many modules, it is worth investigating the meaning of every argument rather than simply taking their default values.
- The reference is encoded as a lookup table, which is k-mer based. Typically k=12, so the reference is encoded as the set of 12-mers that it contains, together with their positions in the reference contigs. (This step occurs in MakeLookupTable rather than in QueryLookupTable).
- Each query sequence is similarly converted to k-mers.
- All matches ("hits") between the k-mers of the read and the reference are noted.
- Candidate alignments are found. A candidate alignment is identified by a cluster of hits.
- Candidate alignments are filtered heuristically. Candidates that pass the filters are converted into alignments.
The default output prints the alignments in brief human-readable form, stating the query and target of each alignment and the starting and ending bases on each. Those base numbers are also 0-based, and the end base is actually one-past-the-end.
- READABLE_BRIEF=False: Turns off the default human-readable output.
- PARSEABLE=True: Prints out the alignments in full machine-readable detail. That option outputs the alignments in LookAlign format, as described by the LookAlign module. If the output (both STDOUT and STDER) are piped to a QLT file (e.g., QueryLookupTable PARSEABLE=True ... >& alignments.qlt), the QLT file can then be used for further analysis modules such as PlotHitsCoverage and EvaluateConsensus.
- VISUAL=True: Prints out the alignments in a visual 3-line format, as defined by the module [[PrintAlignments]. The format is: two aligned sequences, with spaces for gaps, and |/* markers of indels/mismatches above).
These options do not interfere with each other. You may set any or all of them in a given run.