LookAlign

From ArachneWiki

Jump to: navigation, search

LookAlign is the name of a non-executable module in the lookup/ subdirectory, and of the data structure described there. The LookAlign object is one way of representing alignments, in contrast to nobbits. It is useful for describing alignments against a reference genome, facilitated by a lookup table (the "look" in "LookAlign" comes from "lookup table.")

The look_align format is described by the LookAlign.{h,cc}, and forms the machine-readable part of the QueryLookupTable output. The format has:

  • One line per alignment, as a tab separated list of entries
  • A line containing an alignment must start with the keyword “QUERY”
  • A line starting with something other than “QUERY” will be interpreted as a comment

A line containing the information for one alignment (i.e. one look_align object) looks like:

QUERY rid rb re rl or tid tb te tl N (glm)1 (glm)2 … (glm)N

Legend:

  1. QUERY: this keyword must appear as first entry of an alignment
  2. rid: read id (0-based index into the corresponding fastb file of reads)
  3. rb: begin of alignment on read
  4. re: end of alignment on read
  5. rl: length of read
  6. or: orientation of read on reference (0 means same sense as ref, 1 means opposite sense)
  7. tid: id of reference sequence (0-based index into the reference fastb file)
  8. tb: begin of alignment on reference
  9. te: end of alignment on reference
  10. tl: length of reference sequence
  11. N: number of alignment blocks (there are N blocks of the type (glm) following N)
  12. (glm): a block consisting of three tab separated integers:
    1. g: signed gap size (a positive gap indicates reference bases absent from the read, a negative gap means read bases absent from the reference)
    2. l: length of matching portion (mismatches are allowed)
    3. m: number of mismatches in the matching portion

A few examples:

QUERY   426724  0       790     790     1       0       25295   26083   22036055        3       0       112     2       -1      104     2       -1      572     6
QUERY   426725  0       917     917     1       0       25393   26311   22036055        2       0       68      2       1       849     8
QUERY   426726  0       823     823     0       0       25454   26278   22036055        2       0       7       0       1       816     8
QUERY   426727  0       869     869     0       0       25481   26350   22036055        1       0       869     10
QUERY   434393  0       863     863     0       0       25559   26422   22036055        1       0       863     8
QUERY   434394  0       863     863     1       0       25777   26640   22036055        1       0       863     4
QUERY   434396  0       801     801     0       0       25955   26756   22036055        1       0       801     3
QUERY   434395  0       812     812     0       0       25956   26768   22036055        1       0       812     3
Personal tools