Fasta is a standard file format for reads and contigs, indicating a set of sequences. The corresponding format for quality scores is qual. The fasta format is supported by Arachne and also by most base-calling software, such as PHRED. Files in this format include contigs.fasta and assembly_supers.fasta.
Here is an example of two fasta-format reads:
>L1000ABCDEFG.readname ACGCATCGACTGACGTACTCGATCGA TGCTGGTCATGATGCTGACTGACTAG ACGTTGGGACATCACCCGCTAGGTAA TGTCTGATGCCCATG ... >gnl|ti|3 G10P69425RH3.T0 GACGACTGACTGACTGACTACGACGC AAGTGACTACGAGATAGATGACATCG CTGACTAGCATCGTTGACGTACGCCG ACC ...
Each read has a name, which are important for input and output files. Read names are determined in fasta (and qual) files as follows: Take the rightmost white-space-free string on a line beginning with ">". In the above example, the read names are L1000ABCDEFG.readname and G10P69425RH3.T0.
Special Command-line arguments
|Argument name||Argument type||Default value||Meaning|
|SUBSET||Index list||""||If SUBSET is not empty, it is parsed as an IntSet (c.f. ParseSet.h) and only those entries are printed.|
|HEAD||String||"reads"||HEAD.fastb will be converted to HEAD.fasta.|
|CLEAN||Bool||False||If True, generate a blastable file (no blank lines, no empty reads).|
|NAMES||Bool||False||If True, use original read names. HEAD.ids must exist.|
|MAXREADS||UnsignedInt||0||If MAXREADS > 0, print no more than MAXREADS reads.|
|GZIPPED||Bool||False||If True, saves output file in GZIP format.|
|FOLD_SIZE||UnsignedInt||0||If specified, fold fasta sequences into chunks of size FOLD_SIZE. This is needed for some external programs.|
|MAX_BASES||UnsignedInt||0||If specified, will only output sequences <= MAX_BASES.|