Running Queue for the first time
Once you've built Queue from the source distribution you can run and build various pipelines.
Contents |
Test Your Installation
The first step is to test that Queue is correctly built, and that the supporting tools like Java are in your path. Type the following command:
java -jar <path to Queue.jar> --help
replacing the path to Queue.jar with the path you have setup.
You should see usage output similar to the following:
---------------------------------------------------------
Program Name: org.broadinstitute.sting.queue.QCommandLine
---------------------------------------------------------
---------------------------------------------------------
usage: java -jar Queue.jar -S <script> [-jobPrefix <job_name_prefix>] [-jobQueue <job_queue>] [-jobProject <job_project>]
[-jobSGDir <job_scatter_gather_directory>] [-memLimit <default_memory_limit>] [-runDir <run_directory>] [-tempDir
<temp_directory>] [-emailHost <emailSmtpHost>] [-emailPort <emailSmtpPort>] [-emailTLS] [-emailSSL] [-emailUser
<emailUsername>] [-emailPass <emailPassword>] [-emailPassFile <emailPasswordFile>] [-bsub] [-run] [-dot <dot_graph>]
[-expandedDot <expanded_dot_graph>] [-startFromScratch] [-status] [-statusFrom <status_email_from>] [-statusTo
<status_email_to>] [-keepIntermediates] [-retry <retry_failed>] [-l <logging_level>] [-log <log_to_file>] [-quiet]
[-debug] [-h]
-S,--script <script> QScript scala file
-jobPrefix,--job_name_prefix <job_name_prefix> Default name prefix for compute farm jobs.
-jobQueue,--job_queue <job_queue> Default queue for compute farm jobs.
-jobProject,--job_project <job_project> Default project for compute farm jobs.
-jobSGDir,--job_scatter_gather_directory <job_scatter_gather_directory> Default directory to place scatter gather
output for compute farm jobs.
-memLimit,--default_memory_limit <default_memory_limit> Default memory limit for jobs, in gigabytes.
-runDir,--run_directory <run_directory> Root directory to run functions from.
-tempDir,--temp_directory <temp_directory> Temp directory to pass to functions.
-emailHost,--emailSmtpHost <emailSmtpHost> Email SMTP host. Defaults to localhost.
-emailPort,--emailSmtpPort <emailSmtpPort> Email SMTP port. Defaults to 465 for ssl,
otherwise 25.
-emailTLS,--emailUseTLS Email should use TLS. Defaults to false.
-emailSSL,--emailUseSSL Email should use SSL. Defaults to false.
-emailUser,--emailUsername <emailUsername> Email SMTP username. Defaults to none.
-emailPass,--emailPassword <emailPassword> Email SMTP password. Defaults to none. Not
secure! See emailPassFile.
-emailPassFile,--emailPasswordFile <emailPasswordFile> Email SMTP password file. Defaults to none.
-bsub,--bsub_all_jobs Use bsub to submit jobs
-run,--run_scripts Run QScripts. Without this flag set only
performs a dry run.
-dot,--dot_graph <dot_graph> Outputs the queue graph to a .dot file. See:
http://en.wikipedia.org/wiki/DOT_language
-expandedDot,--expanded_dot_graph <expanded_dot_graph> Outputs the queue graph of scatter gather to
a .dot file. Otherwise overwrites the
dot_graph
-startFromScratch,--start_from_scratch Runs all command line functions even if the
outputs were previously output successfully.
-status,--status Get status of jobs for the qscript
-statusFrom,--status_email_from <status_email_from> Email address to send emails from upon
completion or on error.
-statusTo,--status_email_to <status_email_to> Email address to send emails to upon
completion or on error.
-keepIntermediates,--keep_intermediate_outputs After a successful run keep the outputs of
any Function marked as intermediate.
-retry,--retry_failed <retry_failed> Retry the specified number of times after a
command fails. Defaults to no retries.
-l,--logging_level <logging_level> Set the minimum level of logging, i.e.
setting INFO get's you INFO up to FATAL,
setting ERROR gets you ERROR and FATAL level
logging.
-log,--log_to_file <log_to_file> Set the logging location
-quiet,--quiet_output_mode Set the logging to quiet mode, no output to
stdout
-debug,--debug_mode Set the logging file string to include a lot
of debugging information (SLOW!)
-h,--help Generate this help message
Troubleshooting install
If you don't see this message, and instead get an error message there are a couple of things that you should check. First, make sure that your Java version is at least 1.6, by typing the following command:
java -version
You should see something similar to the following text:
java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
If the version is less then 1.6, install the newest version of Java onto the system. If you instead see something like java: Command not found, make sure that java is installed on your machine, and that your PATH variable contains the path to the java executables. On a Mac running OS X 10.5+, you may need to run /Applications/Utilities/Java Preferences.app and drag Java SE 6 to the top before your machine will default to running version 1.6, even if it has been installed. You may also need to set the JAVA_HOME variable manually in Terminal as in:
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
Run Queue
Dry Run Queue
Now that we have correctly setup Queue, lets run it on the Genome Analysis Toolkit with some example data. A common simple analysis that people use the GATK for is getting a count of the reads in a bam file (although the GATK is capable of much more powerful analyses, this will serve as our example).
First verify that you can run the GATK CountReads example.
From the example files in the GATK CountReads use the exampleBAM.bam and the associated files (a bai file), and an exampleFASTA.fasta and the associated files (a .dict file and a fasta.fai file).
Along with the example QScript from your checkout of public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala this is everything you need to run a basic analysis. Then specifying the paths to your current copies of the files, you can run the following command:
java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam
After a few seconds you should see output that looks nearly identical to this:
INFO 12:01:35,191 QScriptManager - Compiling 1 QScript INFO 12:01:38,149 QScriptManager - Compilation complete INFO 12:01:38,225 HelpFormatter - --------------------------------------------------------------------------- INFO 12:01:38,226 HelpFormatter - Queue v1.3-266-g8e4b3f6, Compiled 2011/12/05 11:56:11 INFO 12:01:38,226 HelpFormatter - Copyright (c) 2011 The Broad Institute INFO 12:01:38,226 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki INFO 12:01:38,226 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa INFO 12:01:38,227 HelpFormatter - Program Args: -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam INFO 12:01:38,227 HelpFormatter - Date/Time: 2011/12/05 12:01:38 INFO 12:01:38,227 HelpFormatter - --------------------------------------------------------------------------- INFO 12:01:38,227 HelpFormatter - --------------------------------------------------------------------------- INFO 12:01:38,229 QCommandLine - Scripting ExampleCountReads INFO 12:01:38,386 QCommandLine - Added 1 functions INFO 12:01:38,386 QGraph - Generating graph. INFO 12:01:38,404 QGraph - ------- INFO 12:01:38,410 QGraph - Pending: 'java' '-Djava.io.tmpdir=/Users/droazen/Desktop/src/unstable/tmp' '-cp' '/Users/droazen/Desktop/src/unstable/dist/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleBAM.bam' '-R' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleFASTA.fasta' INFO 12:01:38,410 QGraph - Log: /Users/droazen/Desktop/src/unstable/Q-16340@bm00e-13f-1.out INFO 12:01:38,411 QGraph - Dry run completed successfully! INFO 12:01:38,411 QGraph - Re-run with "-run" to execute the functions. INFO 12:01:38,412 QScript - Script completed successfully with 1 total jobs INFO 12:01:38,413 QCommandLine - Writing JobLogging GATKReport to file Q-16340@bm00e-13f.jobreport.txt
Troubleshooting Dry Run
If you don't see this message, and instead get an error message, first verify the paths to your files.
Running locally
Once you have verified that the Queue functions have been generated successfully, you can execute the pipeline by appending -run to the command line.
java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam -run
After a second you should see output that looks nearly identical to this:
INFO 12:03:49,791 QScriptManager - Compiling 1 QScript INFO 12:03:52,541 QScriptManager - Compilation complete INFO 12:03:52,610 HelpFormatter - --------------------------------------------------------------------------- INFO 12:03:52,610 HelpFormatter - Queue v1.3-266-g8e4b3f6, Compiled 2011/12/05 11:56:11 INFO 12:03:52,610 HelpFormatter - Copyright (c) 2011 The Broad Institute INFO 12:03:52,610 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki INFO 12:03:52,610 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa INFO 12:03:52,611 HelpFormatter - Program Args: -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam -run INFO 12:03:52,611 HelpFormatter - Date/Time: 2011/12/05 12:03:52 INFO 12:03:52,611 HelpFormatter - --------------------------------------------------------------------------- INFO 12:03:52,611 HelpFormatter - --------------------------------------------------------------------------- INFO 12:03:52,614 QCommandLine - Scripting ExampleCountReads INFO 12:03:52,652 QCommandLine - Added 1 functions INFO 12:03:52,653 QGraph - Generating graph. INFO 12:03:52,785 QGraph - Running jobs. INFO 12:03:52,807 FunctionEdge - Starting: 'java' '-Djava.io.tmpdir=/Users/droazen/Desktop/src/unstable/tmp' '-cp' '/Users/droazen/Desktop/src/unstable/dist/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleBAM.bam' '-R' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleFASTA.fasta' INFO 12:03:52,807 FunctionEdge - Output written to /Users/droazen/Desktop/src/unstable/Q-16868@bm00e-13f-1.out INFO 12:03:58,324 QGraph - 0 Pend, 1 Run, 0 Fail, 0 Done INFO 12:04:22,802 FunctionEdge - Done: 'java' '-Djava.io.tmpdir=/Users/droazen/Desktop/src/unstable/tmp' '-cp' '/Users/droazen/Desktop/src/unstable/dist/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleBAM.bam' '-R' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleFASTA.fasta' INFO 12:04:22,803 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done INFO 12:04:22,805 QScript - Script completed successfully with 1 total jobs INFO 12:04:22,807 QCommandLine - Writing JobLogging GATKReport to file Q-16868@bm00e-13f.jobreport.txt INFO 12:04:22,827 QCommandLine - Plotting JobLogging GATKReport to file Q-16868@bm00e-13f.jobreport.txt.pdf WARN 12:04:28,423 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info.
The results of the traversal will be written to a file in the current directory. The name of the file will be printed in the output, Q-16868@bm00e-13f-1.out in this example.
Running on a computing farm
- Run with -bsub to run on LSF, or for early Grid Engine support see Queue with Grid Engine.
- See QFunction and Command Line Options for more info on Queue options.