The module ForDistribution converts Arachne's binary-format output files into a human-readable form that can be used in submissions to NCBI. These files are generated in the KEY directory, whose name defaults to ForDistribution. Note that the ForDistribution directory is a subdirectory of DATA and is related to a precise SUBDIR, specified in the source file.
Of particular interest are the "markup files" listed below. These modules are produced by RunMarkup, which is called by ForDistribution. RunMarkup serves to tag regions in the assembly that are potentially enriched for misassemblies. The markup files are documented here: ftp://ftp.broad.mit.edu/pub/wga/misc/docs/AssemblyMarkup2.0.pdf
- All files generated by Assemblez and placed in RUN (listed in Output), except assemblez.log
- Information about this ForDistribution run
- Basic assembly statistics
- BasicAssemblyStats.out: output from the BasicAssemblyStats module: some core statistics of the assembly (coverage, contig N50, etc.)
- BasicAssemblyOneLiner.out: output from the BasicAssemblyOneLiner module, which is in a CSV format.
- ReadUsage.Table: a simple table with some detailed info about assembled reads statistics.
- LibStatsOverview.out: output from the LibStatsOverview module: some library statistics (percent of reads assembled in valid pairs, percent of reads for which their mate was not assembled, etc.)
- PhysicalCoverageByLib.out: a csv file with details about the physical coverage (broken by library).
- Raw assembly output
- assembly.agp: agp file of the supercontigs (for each supercontig, a list of contig - gap - contig - gap, etc.)
- assembly_supers.fasta.gz: gzipped fasta file for output supercontigs (gaps between contigs are filled with N's)
- assembly_supers.quals.gz: gzipped qual file for output supercontigs (gaps between contigs are filled with N's).
- assembly.bases.gz: gzipped fasta file of output contigs.
- assembly.quals.gz: gzipped quals file of output contigs.
- unplaced.fasta.gz: gzipped fasta of all unplaced reads.
- unplaced.qual.gz: gzipped qual of all unplaced reads.
- Markup files
If the assembly is mapped onto chromosomes
In some cases maps can be used to anchor the assembly to chromosomes. In this case there are four more files (actually symbolic links!) in the ForDistribution directory. These are:
- mapped.agp.chromosome.agp: agp file of the chromosomes (for each chromosome, a list of contig - gap - contig - gap, etc., in the agp format).
- mapped.agp.chromosome.qual.gz: the fasta of the chromosomes.
- mapped.agp.chromosome.fasta.gz: the qual of the chromosomes.
- MapSnps.out: the output of the module MapSnps, to tag SNPs.