The DATA directory is one of the input directories necessary for running Arachne. It is given as a subdirectory of PRE, and it has its subdirectories for Arachne output, most importantly RUN. Each DATA directory corresponds to a particular genome-sequencing project.
The DATA directory is where the raw input data is stored. It contains a
fasta/ subdirectory with fasta reads, a
qual/ subdirectory with qual files, and a
traceinfo/ subdirectory with XML ancillary files. It also contains the files genome.size, insert.sites, and others (see Input for a complete list.) The files from DATA are used in pre-processing and the assembly process.
For example, the script Assemblez requires the following command-line form:
Assemblez PRE=/pre DATA=data RUN=run
This will fail unless the directories
/pre/data already exist. If it succeeds, it will create
/pre/data/run and put output there. Note that DATA must be a relative directory location, not be an absolute one (i.e., it cannot begin with a backslash.) However, DATA may contain internal backslashes, so that
/PRE/DATA is multiple steps below
/PRE in the directory tree. DATA may also contain symbolic links.