LookAlign
From ArachneWiki
LookAlign is the name of a non-executable module in the lookup/ subdirectory, and of the data structure described there. The LookAlign object is one way of representing alignments, in contrast to nobbits. It is useful for describing alignments against a reference genome, facilitated by a lookup table (the "look" in "LookAlign" comes from "lookup table.")
The look_align format is described by the LookAlign.{h,cc}, and forms the machine-readable part of the QueryLookupTable output. The format has:
- One line per alignment, as a tab separated list of entries
- A line containing an alignment must start with the keyword “QUERY”
- A line starting with something other than “QUERY” will be interpreted as a comment
A line containing the information for one alignment (i.e. one look_align object) looks like:
QUERY rid rb re rl or tid tb te tl N (glm)1 (glm)2 … (glm)N
Legend:
- QUERY: this keyword must appear as first entry of an alignment
- rid: read id (0-based index into the corresponding fastb file of reads)
- rb: begin of alignment on read
- re: end of alignment on read
- rl: length of read
- or: orientation of read on reference (0 means same sense as ref, 1 means opposite sense)
- tid: id of reference sequence (0-based index into the reference fastb file)
- tb: begin of alignment on reference
- te: end of alignment on reference
- tl: length of reference sequence
- N: number of alignment blocks (there are N blocks of the type (glm) following N)
- (glm): a block consisting of three tab separated integers:
- g: signed gap size (a positive gap indicates reference bases absent from the read, a negative gap means read bases absent from the reference)
- l: length of matching portion (mismatches are allowed)
- m: number of mismatches in the matching portion
A few examples:
QUERY 426724 0 790 790 1 0 25295 26083 22036055 3 0 112 2 -1 104 2 -1 572 6 QUERY 426725 0 917 917 1 0 25393 26311 22036055 2 0 68 2 1 849 8 QUERY 426726 0 823 823 0 0 25454 26278 22036055 2 0 7 0 1 816 8 QUERY 426727 0 869 869 0 0 25481 26350 22036055 1 0 869 10 QUERY 434393 0 863 863 0 0 25559 26422 22036055 1 0 863 8 QUERY 434394 0 863 863 1 0 25777 26640 22036055 1 0 863 4 QUERY 434396 0 801 801 0 0 25955 26756 22036055 1 0 801 3 QUERY 434395 0 812 812 0 0 25956 26768 22036055 1 0 812 3
