Run a basic analysis command on example data, parallelized with Queue.
One very cool feature of Queue is that you can test your script by doing a "dry run". That means Queue will prepare the analysis and build the scatter commands, but not actually run them. This makes it easier to check the sanity of your script and command.
Here we're going to set up a dry run of a CountReads analysis. You should be familiar with the CountReads walker and the example files from the bundles, as used in the basic "GATK for the first time" tutorial. In addition, we're going to use the example QScript called ExampleCountReads.scala provided in the Queue package download.
Type the following command:
java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam
where -S ExampleCountReads.scala specifies which QScript we want to run, -R exampleFASTA.fasta specifies the reference sequence, and -I exampleBAM.bam specifies the file of aligned reads we want to analyze.
After a few seconds you should see output that looks nearly identical to this:
INFO 00:30:45,527 QScriptManager - Compiling 1 QScript
INFO 00:30:52,869 QScriptManager - Compilation complete
INFO 00:30:53,284 HelpFormatter - ----------------------------------------------------------------------
INFO 00:30:53,284 HelpFormatter - Queue v2.0-36-gf5c1c1a, Compiled 2012/08/08 20:18:21
INFO 00:30:53,284 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 00:30:53,284 HelpFormatter - Fro support and documentation go to http://www.broadinstitute.org/gatk
INFO 00:30:53,285 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam
INFO 00:30:53,285 HelpFormatter - Date/Time: 2012/08/09 00:30:53
INFO 00:30:53,285 HelpFormatter - ----------------------------------------------------------------------
INFO 00:30:53,285 HelpFormatter - ----------------------------------------------------------------------
INFO 00:30:53,290 QCommandLine - Scripting ExampleCountReads
INFO 00:30:53,364 QCommandLine - Added 1 functions
INFO 00:30:53,364 QGraph - Generating graph.
INFO 00:30:53,388 QGraph - -------
INFO 00:30:53,402 QGraph - Pending: 'java' '-Xmx1024m' '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp' '-cp' '/Users/vdauwera/sandbox/Q2/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam' '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta'
INFO 00:30:53,403 QGraph - Log: /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads-1.out
INFO 00:30:53,403 QGraph - Dry run completed successfully!
INFO 00:30:53,404 QGraph - Re-run with "-run" to execute the functions.
INFO 00:30:53,409 QCommandLine - Script completed successfully with 1 total jobs
INFO 00:30:53,410 QCommandLine - Writing JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.txt
If you don't see this, check your spelling (GATK commands are case-sensitive), check that the files are in your working directory, and if necessary, re-check that the GATK and Queue are properly installed.
If you do see this output, congratulations! You just successfully ran you first Queue dry run!
Once you have verified that the Queue functions have been generated successfully, you can execute the pipeline by appending -run to the command line.
Instead of this command, which we used earlier:
java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam
this time you type this:
java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run
See the difference?
You should see output that looks nearly identical to this:
INFO 00:56:33,688 QScriptManager - Compiling 1 QScript
INFO 00:56:39,327 QScriptManager - Compilation complete
INFO 00:56:39,487 HelpFormatter - ----------------------------------------------------------------------
INFO 00:56:39,487 HelpFormatter - Queue v2.0-36-gf5c1c1a, Compiled 2012/08/08 20:18:21
INFO 00:56:39,488 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 00:56:39,488 HelpFormatter - Fro support and documentation go to http://www.broadinstitute.org/gatk
INFO 00:56:39,489 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run
INFO 00:56:39,490 HelpFormatter - Date/Time: 2012/08/09 00:56:39
INFO 00:56:39,490 HelpFormatter - ----------------------------------------------------------------------
INFO 00:56:39,491 HelpFormatter - ----------------------------------------------------------------------
INFO 00:56:39,498 QCommandLine - Scripting ExampleCountReads
INFO 00:56:39,569 QCommandLine - Added 1 functions
INFO 00:56:39,569 QGraph - Generating graph.
INFO 00:56:39,589 QGraph - Running jobs.
INFO 00:56:39,623 FunctionEdge - Starting: 'java' '-Xmx1024m' '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp' '-cp' '/Users/vdauwera/sandbox/Q2/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam' '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta'
INFO 00:56:39,623 FunctionEdge - Output written to /Users/GG/codespace/GATK/Q2/resources/ExampleCountReads-1.out
INFO 00:56:50,301 QGraph - 0 Pend, 1 Run, 0 Fail, 0 Done
INFO 00:57:09,827 FunctionEdge - Done: 'java' '-Xmx1024m' '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp' '-cp' '/Users/vdauwera/sandbox/Q2/resources/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam' '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta'
INFO 00:57:09,828 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done
INFO 00:57:09,835 QCommandLine - Script completed successfully with 1 total jobs
INFO 00:57:09,835 QCommandLine - Writing JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.txt
INFO 00:57:10,107 QCommandLine - Plotting JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.pdf
WARN 00:57:18,597 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info.
Great! It works!
The results of the traversal will be written to a file in the current directory. The name of the file will be printed in the output, ExampleCountReads.out in this example.
If for some reason the run was interrupted, in most cases you can resume by just launching the command. Queue will pick up where it left off without redoing the parts that ran successfully.
Run with -bsub to run on LSF, or for early Grid Engine support see Queue with Grid Engine.
See also QFunction and Command Line Options for more info on Queue options.
Run a basic analysis command on example data.
A very simple analysis that you can do with the GATK is getting a count of the reads in a BAM file. The GATK is capable of much more powerful analyses, but this is a good starting example because there are very few things that can go wrong.
So we are going to count the reads in the file exampleBAM.bam, which you can find in the GATK resource bundle along with its associated index (same file name with .bai extension), as well as the example reference exampleFASTA.fasta and its associated index (same file name with .fai extension) and dictionary (same file name with .dict extension). Copy them to your working directory so that your directory contents look like this:
[bm4dd-56b:~/codespace/gatk/sandbox] vdauwera% ls -la
drwxr-xr-x 9 vdauwera CHARLES\Domain Users 306 Jul 25 16:29 .
drwxr-xr-x@ 6 vdauwera CHARLES\Domain Users 204 Jul 25 15:31 ..
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 3635 Apr 10 07:39 exampleBAM.bam
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 232 Apr 10 07:39 exampleBAM.bam.bai
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 148 Apr 10 07:39 exampleFASTA.dict
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 101673 Apr 10 07:39 exampleFASTA.fasta
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 20 Apr 10 07:39 exampleFASTA.fasta.fai
Type the following command:
java -jar <path to GenomeAnalysisTK.jar> -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam
where -T CountReads specifies which analysis tool we want to use, -R exampleFASTA.fasta specifies the reference sequence, and -I exampleBAM.bam specifies the file of aligned reads we want to analyze.
For any analysis that you want to run on a set of aligned reads, you will always need to use at least these three arguments:
-T for the tool name, which specifices the corresponding analysis-R for the reference sequence file-I for the input BAM file of aligned readsThey don't have to be in that order in your command, but this way you can remember that you need them if you TRI...
After a few seconds you should see output that looks like to this:
INFO 16:17:45,945 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:17:45,946 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.0-22-g40f97eb, Compiled 2012/07/25 15:29:41
INFO 16:17:45,947 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 16:17:45,947 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 16:17:45,947 HelpFormatter - Program Args: -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam
INFO 16:17:45,947 HelpFormatter - Date/Time: 2012/07/25 16:17:45
INFO 16:17:45,947 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:17:45,948 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:17:45,950 GenomeAnalysisEngine - Strictness is SILENT
INFO 16:17:45,982 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 16:17:45,993 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO 16:17:46,060 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 16:17:46,060 TraversalEngine - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 16:17:46,061 Walker - [REDUCE RESULT] Traversal result is: 33
INFO 16:17:46,061 TraversalEngine - Total runtime 0.00 secs, 0.00 min, 0.00 hours
INFO 16:17:46,100 TraversalEngine - 0 reads were filtered out during traversal out of 33 total (0.00%)
INFO 16:17:46,729 GATKRunReport - Uploaded run statistics report to AWS S3
Depending on the GATK release, you may see slightly different information output, but you know everything is running correctly if you see the line:
INFO 21:53:04,556 Walker - [REDUCE RESULT] Traversal result is: 33
somewhere in your output.
If you don't see this, check your spelling (GATK commands are case-sensitive), check that the files are in your working directory, and if necessary, re-check that the GATK is properly installed.
If you do see this output, congratulations! You just successfully ran you first GATK analysis!
Basically the output you see means that the CountReadsWalker (which you invoked with the command line option -T CountReads) counted 33 reads in the exampleBAM.bam file, which is exactly what we expect to see.
Wait, what is this walker thing?
In the GATK jargon, we call the tools walkers because the way they work is that they walk through the dataset --either along the reference sequence (LocusWalkers), or down the list of reads in the BAM file (ReadWalkers)-- collecting the requested information along the way.
Now that you're rocking the read counts, you can start to expand your use of the GATK command line.
Let's say you don't care about counting reads anymore; now you want to know the number of loci (positions on the genome) that are covered by one or more reads. The name of the tool, or walker, that does this is CountLoci. Since the structure of the GATK command is basically always the same, you can simply switch the tool name, right?
Instead of this command, which we used earlier:
java -jar <path to GenomeAnalysisTK.jar> -T CountReads -R exampleFASTA.fasta -I exampleBAM.bam
this time you type this:
java -jar <path to GenomeAnalysisTK.jar> -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam
See the difference?
You should see something like this output:
INFO 16:18:26,183 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:18:26,185 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.0-22-g40f97eb, Compiled 2012/07/25 15:29:41
INFO 16:18:26,185 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 16:18:26,185 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 16:18:26,186 HelpFormatter - Program Args: -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam
INFO 16:18:26,186 HelpFormatter - Date/Time: 2012/07/25 16:18:26
INFO 16:18:26,186 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:18:26,186 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:18:26,189 GenomeAnalysisEngine - Strictness is SILENT
INFO 16:18:26,222 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 16:18:26,233 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO 16:18:26,351 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 16:18:26,351 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
2052
INFO 16:18:26,411 TraversalEngine - Total runtime 0.08 secs, 0.00 min, 0.00 hours
INFO 16:18:26,450 TraversalEngine - 0 reads were filtered out during traversal out of 33 total (0.00%)
INFO 16:18:27,124 GATKRunReport - Uploaded run statistics report to AWS S3
Great! But wait -- where's the result? Last time the result was given on this line:
INFO 21:53:04,556 Walker - [REDUCE RESULT] Traversal result is: 33
But this time there is no line that says [REDUCE RESULT]! Is something wrong?
Not really. The program ran just fine -- but we forgot to give it an output file name. You see, the CountLoci walker is set up to output the result of its calculations to a text file, unlike CountReads, which is perfectly happy to output its result to the terminal screen.
So we repeat the command, but this time we specify an output file, like this:
java -jar <path to GenomeAnalysisTK.jar> -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt
where -o (lowercase o, not zero) is used to specify the output.
You should get essentially the same output on the terminal screen as previously (but notice the difference in the line that contains Program Args -- the new argument is included):
INFO 16:29:15,451 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:29:15,453 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.0-22-g40f97eb, Compiled 2012/07/25 15:29:41
INFO 16:29:15,453 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 16:29:15,453 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 16:29:15,453 HelpFormatter - Program Args: -T CountLoci -R exampleFASTA.fasta -I exampleBAM.bam -o output.txt
INFO 16:29:15,454 HelpFormatter - Date/Time: 2012/07/25 16:29:15
INFO 16:29:15,454 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:29:15,454 HelpFormatter - ---------------------------------------------------------------------------------
INFO 16:29:15,457 GenomeAnalysisEngine - Strictness is SILENT
INFO 16:29:15,488 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 16:29:15,499 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
INFO 16:29:15,618 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 16:29:15,618 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 16:29:15,679 TraversalEngine - Total runtime 0.08 secs, 0.00 min, 0.00 hours
INFO 16:29:15,718 TraversalEngine - 0 reads were filtered out during traversal out of 33 total (0.00%)
INFO 16:29:16,712 GATKRunReport - Uploaded run statistics report to AWS S3
This time however, if we look inside the working directory, there is a newly created file there called output.txt.
[bm4dd-56b:~/codespace/gatk/sandbox] vdauwera% ls -la
drwxr-xr-x 9 vdauwera CHARLES\Domain Users 306 Jul 25 16:29 .
drwxr-xr-x@ 6 vdauwera CHARLES\Domain Users 204 Jul 25 15:31 ..
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 3635 Apr 10 07:39 exampleBAM.bam
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 232 Apr 10 07:39 exampleBAM.bam.bai
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 148 Apr 10 07:39 exampleFASTA.dict
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 101673 Apr 10 07:39 exampleFASTA.fasta
-rw-r--r--@ 1 vdauwera CHARLES\Domain Users 20 Apr 10 07:39 exampleFASTA.fasta.fai
-rw-r--r-- 1 vdauwera CHARLES\Domain Users 5 Jul 25 16:29 output.txt
This file contains the result of the analysis:
[bm4dd-56b:~/codespace/gatk/sandbox] vdauwera% cat output.txt
2052
This means that there are 2052 loci in the reference sequence that are covered by at least one or more reads in the BAM file.
Okay then, but why not show the full, correct command in the first place? Because this was a good opportunity for you to learn a few of the caveats of the GATK command system, which may save you a lot of frustration later on.
Beyond the common basic arguments that almost all GATK walkers require, most of them also have specific requirements or options that are important to how they work. You should always check what are the specific arguments that are required, recommended and/or optional for the walker you want to use before starting an analysis.
Fortunately the GATK is set up to complain (i.e. terminate with an error message) if you try to run it without specifying a required argument. For example, if you try to run this:
java -jar <path to GenomeAnalysisTK.jar> -T CountLoci -R exampleFASTA.fasta
the GATK will spit out a wall of text, including the basic usage guide that you can invoke with the --help option, and more importantly, the following error message:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.0-22-g40f97eb):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Walker requires reads but none were provided.
##### ERROR ------------------------------------------------------------------------------------------
You see the line that says ERROR MESSAGE: Walker requires reads but none were provided? This tells you exactly what was wrong with your command.
So the GATK will not run if a walker does not have all the required inputs. That's a good thing! But in the case of our first attempt at running CountLoci, the -o argument is not required by the GATK to run -- it's just highly desirable if you actually want the result of the analysis!
There will be many other cases of walkers with arguments that are not strictly required, but highly desirable if you want the results to be meaningful.
So, at the risk of getting repetitive, always read the documentation of each walker that you want to use!
Test that the GATK is correctly installed, and that the supporting tools like Java are in your path.
The command we're going to run is a very simple command that asks the GATK to print out a list of available command-line arguments and options. It is so simple that it will ALWAYS work if your GATK package is installed correctly.
Note that this command is also helpful when you're trying to remember something like the right spelling or short name for an argument and for whatever reason you don't have access to the web-based documentation.
Type the following command:
java -jar <path to GenomeAnalysisTK.jar> --help
replacing the <path to GenomeAnalysisTK.jar> bit with the path you have set up in your command-line environment.
You should see usage output similar to the following:
usage: java -jar GenomeAnalysisTK.jar -T <analysis_type> [-I <input_file>] [-L
<intervals>] [-R <reference_sequence>] [-B <rodBind>] [-D <DBSNP>] [-H
<hapmap>] [-hc <hapmap_chip>] [-o <out>] [-e <err>] [-oe <outerr>] [-A] [-M
<maximum_reads>] [-sort <sort_on_the_fly>] [-compress <bam_compression>] [-fmq0] [-dfrac
<downsample_to_fraction>] [-dcov <downsample_to_coverage>] [-S
<validation_strictness>] [-U] [-P] [-dt] [-tblw] [-nt <numthreads>] [-l
<logging_level>] [-log <log_to_file>] [-quiet] [-debug] [-h]
-T,--analysis_type <analysis_type> Type of analysis to run
-I,--input_file <input_file> SAM or BAM file(s)
-L,--intervals <intervals> A list of genomic intervals over which
to operate. Can be explicitly specified
on the command line or in a file.
-R,--reference_sequence <reference_sequence> Reference sequence file
-B,--rodBind <rodBind> Bindings for reference-ordered data, in
the form <name>,<type>,<file>
-D,--DBSNP <DBSNP> DBSNP file
-H,--hapmap <hapmap> Hapmap file
-hc,--hapmap_chip <hapmap_chip> Hapmap chip file
-o,--out <out> An output file presented to the walker.
Will overwrite contents if file exists.
-e,--err <err> An error output file presented to the
walker. Will overwrite contents if file
exists.
-oe,--outerr <outerr> A joint file for 'normal' and error
output presented to the walker. Will
overwrite contents if file exists.
...
If you see this message, your GATK installation is ok. You're good to go! If you don't see this message, and instead get an error message, proceed to the next section on troubleshooting.
Let's try to figure out what's not working.
First, make sure that your Java version is at least 1.6, by typing the following command:
java -version
You should see something similar to the following text:
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
If the version is less then 1.6, install the newest version of Java onto the system. If you instead see something like
java: Command not found
make sure that java is installed on your machine, and that your PATH variable contains the path to the java executables.
On a Mac running OS X 10.5+, you may need to run /Applications/Utilities/Java Preferences.app and drag Java SE 6 to the top to make your machine run version 1.6, even if it has been installed.
Test that Queue is correctly installed, and that the supporting tools like Java are in your path.
The command we're going to run is a very simple command that asks Queue to print out a list of available command-line arguments and options. It is so simple that it will ALWAYS work if your Queue package is installed correctly.
Note that this command is also helpful when you're trying to remember something like the right spelling or short name for an argument and for whatever reason you don't have access to the web-based documentation.
Type the following command:
java -jar <path to Queue.jar> --help
replacing the <path to Queue.jar> bit with the path you have set up in your command-line environment.
You should see usage output similar to the following:
usage: java -jar Queue.jar -S <script> [-jobPrefix <job_name_prefix>] [-jobQueue <job_queue>] [-jobProject <job_project>]
[-jobSGDir <job_scatter_gather_directory>] [-memLimit <default_memory_limit>] [-runDir <run_directory>] [-tempDir
<temp_directory>] [-emailHost <emailSmtpHost>] [-emailPort <emailSmtpPort>] [-emailTLS] [-emailSSL] [-emailUser
<emailUsername>] [-emailPass <emailPassword>] [-emailPassFile <emailPasswordFile>] [-bsub] [-run] [-dot <dot_graph>]
[-expandedDot <expanded_dot_graph>] [-startFromScratch] [-status] [-statusFrom <status_email_from>] [-statusTo
<status_email_to>] [-keepIntermediates] [-retry <retry_failed>] [-l <logging_level>] [-log <log_to_file>] [-quiet]
[-debug] [-h]
-S,--script <script> QScript scala file
-jobPrefix,--job_name_prefix <job_name_prefix> Default name prefix for compute farm jobs.
-jobQueue,--job_queue <job_queue> Default queue for compute farm jobs.
-jobProject,--job_project <job_project> Default project for compute farm jobs.
-jobSGDir,--job_scatter_gather_directory <job_scatter_gather_directory> Default directory to place scatter gather
output for compute farm jobs.
-memLimit,--default_memory_limit <default_memory_limit> Default memory limit for jobs, in gigabytes.
-runDir,--run_directory <run_directory> Root directory to run functions from.
-tempDir,--temp_directory <temp_directory> Temp directory to pass to functions.
-emailHost,--emailSmtpHost <emailSmtpHost> Email SMTP host. Defaults to localhost.
-emailPort,--emailSmtpPort <emailSmtpPort> Email SMTP port. Defaults to 465 for ssl,
otherwise 25.
-emailTLS,--emailUseTLS Email should use TLS. Defaults to false.
-emailSSL,--emailUseSSL Email should use SSL. Defaults to false.
-emailUser,--emailUsername <emailUsername> Email SMTP username. Defaults to none.
-emailPass,--emailPassword <emailPassword> Email SMTP password. Defaults to none. Not
secure! See emailPassFile.
-emailPassFile,--emailPasswordFile <emailPasswordFile> Email SMTP password file. Defaults to none.
-bsub,--bsub_all_jobs Use bsub to submit jobs
-run,--run_scripts Run QScripts. Without this flag set only
performs a dry run.
-dot,--dot_graph <dot_graph> Outputs the queue graph to a .dot file. See:
http://en.wikipedia.org/wiki/DOT_language
-expandedDot,--expanded_dot_graph <expanded_dot_graph> Outputs the queue graph of scatter gather to
a .dot file. Otherwise overwrites the
dot_graph
-startFromScratch,--start_from_scratch Runs all command line functions even if the
outputs were previously output successfully.
-status,--status Get status of jobs for the qscript
-statusFrom,--status_email_from <status_email_from> Email address to send emails from upon
completion or on error.
-statusTo,--status_email_to <status_email_to> Email address to send emails to upon
completion or on error.
-keepIntermediates,--keep_intermediate_outputs After a successful run keep the outputs of
any Function marked as intermediate.
-retry,--retry_failed <retry_failed> Retry the specified number of times after a
command fails. Defaults to no retries.
-l,--logging_level <logging_level> Set the minimum level of logging, i.e.
setting INFO get's you INFO up to FATAL,
setting ERROR gets you ERROR and FATAL level
logging.
-log,--log_to_file <log_to_file> Set the logging location
-quiet,--quiet_output_mode Set the logging to quiet mode, no output to
stdout
-debug,--debug_mode Set the logging file string to include a lot
of debugging information (SLOW!)
-h,--help Generate this help message
If you see this message, your Queue installation is ok. You're good to go! If you don't see this message, and instead get an error message, proceed to the next section on troubleshooting.
Let's try to figure out what's not working.
First, make sure that your Java version is at least 1.6, by typing the following command:
java -version
You should see something similar to the following text:
java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)
If the version is less then 1.6, install the newest version of Java onto the system. If you instead see something like
java: Command not found
make sure that java is installed on your machine, and that your PATH variable contains the path to the java executables.
On a Mac running OS X 10.5+, you may need to run /Applications/Utilities/Java Preferences.app and drag Java SE 6 to the top to make your machine run version 1.6, even if it has been installed.