Each genome is different and requires individual attention when being assembled. Rather than providing a "one size fits all" solution, Arachne lets you specifically address certain issues by running quite specialized modules -- and there are lots of modules. Chances are you will find something that comes very close to doing exactly what you want.
NOTE: After Assemblez has been run, the identity of RUN changes to include work. If given RUN=run, Assemblez will put the assembly files in run/work. Any further modules you call must then specify RUN=run/work.
This section lists only a few examples of sequences of operations that can be performed. For more details, please check the Algorithms section.
Repair false joins
Scaffolds do not extend
Rebuilder use_sw_gap=True max_overlap_score=200000 max_errors_in_align=10000 end_stretch_in_align=24 max_gap_in_swgap=6000 max_indel_in_swgap=6000 (optional: keep_existing_supers=True)
Even more aggressive (especially with the option MIN_LINKS lowered) and only recommended for highly polymorphic genomes is
Specifically isolate contigs or supers
RecycleGarbage lets you manipulate assemblies in many ways, including removing contigs, reads or scaffolds, e.g.
RecycleGarbage DUMP_BAD=False DUMP_STUPID=False SUPERS_TO_KEEP=6
will remove all but scaffold 6, whereas
RecycleGarbage DUMP_BAD=False DUMP_STUPID=False SUPERS_TO_DUMP=6
removes scaffold 6 and keeps the rest.
Reads are still misplaced
Reads can still be in the wrong places and need to be re-arranged; this can be due to repeats or polymorphism. To first aggressively remove reads, run
which will remove both end reads if one of them has a high-quality difference with at least one other read.
will subsequently add reads if they can be placed consistent as pairs. The option NO_SNPS controls whether you want high quality disagreements between reads (False) or not (True; for a haploid, one might want to avoid that).
will also place single reads (i.e. the partner cannot be placed wiithin the same scaffold) either at the ends of scaffolds or near contig gaps.
Consensus needs repair
Some modules introduce disagreements between the contig consensus, e.g. by moving reads between copies of a repeat. To update the consensus to reflect the correct read placements, run
Important note: make sure the input assembly contains the file mergedcontigs.superb. You can generate it by running FindGapDeviations.
Alternatively, you can run
which is usually faster but still somewhat experimental.
The Arachne code package includes several alignment tools, with varying goals and levels of refinement. Executable tools (available at the command line) include CmpSeq, QueryLookupTable, and PerfectLookup. Programmatic tools (available to C++ programs) include LocalAlign.
Generating Ace files
Ace files are the main input files for Consed, a tool for viewing assemblies by graphically showing the aligned reads on a contig-by-contig basis. The module CreateAceFile generates ace files from a finished assembly; you can also create them automatically by setting ACE=True and setting ACEDIR appropriately. See CreateAceFile.