Consensus or contig consensus is the algorithm in which the information from several reads is fused into a single sequence, called a contig. The word consensus can also refer to the newly created contig itself, or to the particular bases that make it up. Consensus is the primary method by which contigs are created.
Each base in the consensus represents the majority opinion of the reads at that location -- hence the term "consensus". Consensus is the final of the three standard steps in the assembly process, the first two being overlap and layout. It is possible to view a consensus directly by converting it into a tiling.
The consensus algorithm
- A dynamic programming phase finds the best (minimal) path through the reads, from the beginning to the end of the contig. These reads are aligned and transformed into a draft consensus.
- All reads are aligned to the draft consensus to assign the correct location to each read.
- The consensus value of each base is determined. For every base, each read covering the base gets a vote (weighted by the quality score) to either confirm or change the draft consensus. The majority vote then determines the contig consensus sequence.
- Contig consensus quality scores are derived from agreeing and disagreeing reads. Quality scores will be better at higher coverage. Note that if reads strongly disagree over a base (e.g., in case of an SNP), the consensus quality is set to 0.
Arachne modules to completely recompute consensus are DraftConsensus or ParallelDraftConsensus followed by FixConsensus. Note that this algorithm is very strict and will remove reads if alignments are worse than a certain threshold. Also: contigs will be broken in places with insufficient read coverage.
To recompute the consensus without harming the contig's integrity (i.e. without removing reads and/or breaking contigs), either GenerateTilings (ParallelGenerateTilings) followed by FixConsensus can be run, or alternatively KmerKonsensus.