Assembly.reads

From ArachneWiki

Jump to: navigation, search

The output file assembly.reads describes the placement of reads within contigs in a draft assembly. It has a tab-delimited format with the following fields, one row per placed read, ordered by the id of the contig containing the read and the approximate coordinate of the first base of the trimmed read in the contig:

     Type 	Meaning
     String 	Name of read
     String 	Status of read
     Integer 	Untrimmed read length
     Integer 	Coordinate of first base of trimmed read in untrimmed read (zero-indexed)
     Integer 	Length of trimmed read in untrimmed read
     Integer 	ID of contig containing read
     Integer 	Length of contig containing read
     Integer 	Approximate coordinate of first base of trimmed read in contig (zero-indexed)
     Integer 	Approximate coordinate of last base of trimmed read in contig (zero-indexed)
     '+' or '-' 	Strand (orientation of read on contig)
     String 	Name of this read's partner (empty if unpaired)
     String 	Status of this read's partner
     Integer 	ID of the contig containing this read's partner (empty if unpaired or partner unplaced)
     Integer 	Observed insert size (empty if unpaired, partner unplaced, or partner in different supercontig)
     Integer 	Given insert size (empty if unpaired, or if status is S or M)
     Integer 	Given insert size standard deviation (empty if unpaired, or if status is S or M)
     Float 	Observed insert size deviation measure (empty if observed insert size is empty)

The status of the read is a set of characters used to flag conditions of note. Currently that field will either be empty or contain one or more of the following one-letter codes:

  • M: The read is multiply placed. A multiply placed read is not given a contig id or length. For pairings in which one of the reads is multiply placed, no insert size or deviation measure is given.
  • S: The read and its partner are on the same supercontig and have the same orientation. This often implies chimerism.
  • T: The read is a transposon. In this case, the observed insert size is the observed separation of the transposon read and its partner.

The insert size and standard deviation fields

The observed insert size deviation measure field contains the result of the calculation:

(observed insert size - given insert size) / given insert size standard deviation

This gives you a signed measure of the observed insert size relative to the given insert size.

Note that the observed insert size may include estimated gap sizes between contigs unless the read and its partner are located in the same contig. Also note that for pairings with the code M and S, no observed insert size or deviation measure is given.

Personal tools