Running Queue for the first time

From GSA
Jump to: navigation, search

Once you've built Queue from the source distribution you can run and build various pipelines.

Contents

Test Your Installation

The first step is to test that Queue is correctly built, and that the supporting tools like Java are in your path. Type the following command:

java -jar <path to Queue.jar> --help

replacing the path to Queue.jar with the path you have setup.

You should see usage output similar to the following:

---------------------------------------------------------
Program Name: org.broadinstitute.sting.queue.QCommandLine
---------------------------------------------------------
---------------------------------------------------------
usage: java -jar Queue.jar -S <script> [-jobPrefix <job_name_prefix>] [-jobQueue <job_queue>] [-jobProject <job_project>]
       [-jobSGDir <job_scatter_gather_directory>] [-memLimit <default_memory_limit>] [-runDir <run_directory>] [-tempDir
       <temp_directory>] [-emailHost <emailSmtpHost>] [-emailPort <emailSmtpPort>] [-emailTLS] [-emailSSL] [-emailUser
       <emailUsername>] [-emailPass <emailPassword>] [-emailPassFile <emailPasswordFile>] [-bsub] [-run] [-dot <dot_graph>]
       [-expandedDot <expanded_dot_graph>] [-startFromScratch] [-status] [-statusFrom <status_email_from>] [-statusTo
       <status_email_to>] [-keepIntermediates] [-retry <retry_failed>] [-l <logging_level>] [-log <log_to_file>] [-quiet]
       [-debug] [-h]

 -S,--script <script>                                                      QScript scala file
 -jobPrefix,--job_name_prefix <job_name_prefix>                            Default name prefix for compute farm jobs.
 -jobQueue,--job_queue <job_queue>                                         Default queue for compute farm jobs.
 -jobProject,--job_project <job_project>                                   Default project for compute farm jobs.
 -jobSGDir,--job_scatter_gather_directory <job_scatter_gather_directory>   Default directory to place scatter gather
                                                                           output for compute farm jobs.
 -memLimit,--default_memory_limit <default_memory_limit>                   Default memory limit for jobs, in gigabytes.
 -runDir,--run_directory <run_directory>                                   Root directory to run functions from.
 -tempDir,--temp_directory <temp_directory>                                Temp directory to pass to functions.
 -emailHost,--emailSmtpHost <emailSmtpHost>                                Email SMTP host. Defaults to localhost.
 -emailPort,--emailSmtpPort <emailSmtpPort>                                Email SMTP port. Defaults to 465 for ssl,
                                                                           otherwise 25.
 -emailTLS,--emailUseTLS                                                   Email should use TLS. Defaults to false.
 -emailSSL,--emailUseSSL                                                   Email should use SSL. Defaults to false.
 -emailUser,--emailUsername <emailUsername>                                Email SMTP username. Defaults to none.
 -emailPass,--emailPassword <emailPassword>                                Email SMTP password. Defaults to none. Not
                                                                           secure! See emailPassFile.
 -emailPassFile,--emailPasswordFile <emailPasswordFile>                    Email SMTP password file. Defaults to none.
 -bsub,--bsub_all_jobs                                                     Use bsub to submit jobs
 -run,--run_scripts                                                        Run QScripts.  Without this flag set only
                                                                           performs a dry run.
 -dot,--dot_graph <dot_graph>                                              Outputs the queue graph to a .dot file.  See:
                                                                           http://en.wikipedia.org/wiki/DOT_language
 -expandedDot,--expanded_dot_graph <expanded_dot_graph>                    Outputs the queue graph of scatter gather to
                                                                           a .dot file.  Otherwise overwrites the
                                                                           dot_graph
 -startFromScratch,--start_from_scratch                                    Runs all command line functions even if the
                                                                           outputs were previously output successfully.
 -status,--status                                                          Get status of jobs for the qscript
 -statusFrom,--status_email_from <status_email_from>                       Email address to send emails from upon
                                                                           completion or on error.
 -statusTo,--status_email_to <status_email_to>                             Email address to send emails to upon
                                                                           completion or on error.
 -keepIntermediates,--keep_intermediate_outputs                            After a successful run keep the outputs of
                                                                           any Function marked as intermediate.
 -retry,--retry_failed <retry_failed>                                      Retry the specified number of times after a
                                                                           command fails.  Defaults to no retries.
 -l,--logging_level <logging_level>                                        Set the minimum level of logging, i.e.
                                                                           setting INFO get's you INFO up to FATAL,
                                                                           setting ERROR gets you ERROR and FATAL level
                                                                           logging.
 -log,--log_to_file <log_to_file>                                          Set the logging location
 -quiet,--quiet_output_mode                                                Set the logging to quiet mode, no output to
                                                                           stdout
 -debug,--debug_mode                                                       Set the logging file string to include a lot
                                                                           of debugging information (SLOW!)
 -h,--help                                                                 Generate this help message

Troubleshooting install

If you don't see this message, and instead get an error message there are a couple of things that you should check. First, make sure that your Java version is at least 1.6, by typing the following command:

java -version

You should see something similar to the following text:

java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

If the version is less then 1.6, install the newest version of Java onto the system. If you instead see something like java: Command not found, make sure that java is installed on your machine, and that your PATH variable contains the path to the java executables. On a Mac running OS X 10.5+, you may need to run /Applications/Utilities/Java Preferences.app and drag Java SE 6 to the top before your machine will default to running version 1.6, even if it has been installed. You may also need to set the JAVA_HOME variable manually in Terminal as in:

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home

Run Queue

Dry Run Queue

Now that we have correctly setup Queue, lets run it on the Genome Analysis Toolkit with some example data. A common simple analysis that people use the GATK for is getting a count of the reads in a bam file (although the GATK is capable of much more powerful analyses, this will serve as our example).

First verify that you can run the GATK CountReads example.

From the example files in the GATK CountReads use the exampleBAM.bam and the associated files (a bai file), and an exampleFASTA.fasta and the associated files (a .dict file and a fasta.fai file).

Along with the example QScript from your checkout of public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala this is everything you need to run a basic analysis. Then specifying the paths to your current copies of the files, you can run the following command:

java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam

After a few seconds you should see output that looks nearly identical to this:

INFO  12:01:35,191 QScriptManager - Compiling 1 QScript 
INFO  12:01:38,149 QScriptManager - Compilation complete 
INFO  12:01:38,225 HelpFormatter - --------------------------------------------------------------------------- 
INFO  12:01:38,226 HelpFormatter - Queue v1.3-266-g8e4b3f6, Compiled 2011/12/05 11:56:11 
INFO  12:01:38,226 HelpFormatter - Copyright (c) 2011 The Broad Institute 
INFO  12:01:38,226 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki 
INFO  12:01:38,226 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa 
INFO  12:01:38,227 HelpFormatter - Program Args: -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam 
INFO  12:01:38,227 HelpFormatter - Date/Time: 2011/12/05 12:01:38 
INFO  12:01:38,227 HelpFormatter - --------------------------------------------------------------------------- 
INFO  12:01:38,227 HelpFormatter - --------------------------------------------------------------------------- 
INFO  12:01:38,229 QCommandLine - Scripting ExampleCountReads 
INFO  12:01:38,386 QCommandLine - Added 1 functions 
INFO  12:01:38,386 QGraph - Generating graph. 
INFO  12:01:38,404 QGraph - ------- 
INFO  12:01:38,410 QGraph - Pending:  'java'    '-Djava.io.tmpdir=/Users/droazen/Desktop/src/unstable/tmp'  '-cp' '/Users/droazen/Desktop/src/unstable/dist/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleBAM.bam'  '-R' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleFASTA.fasta'  
INFO  12:01:38,410 QGraph - Log: /Users/droazen/Desktop/src/unstable/Q-16340@bm00e-13f-1.out 
INFO  12:01:38,411 QGraph - Dry run completed successfully! 
INFO  12:01:38,411 QGraph - Re-run with "-run" to execute the functions. 
INFO  12:01:38,412 QScript - Script completed successfully with 1 total jobs 
INFO  12:01:38,413 QCommandLine - Writing JobLogging GATKReport to file Q-16340@bm00e-13f.jobreport.txt  

Troubleshooting Dry Run

If you don't see this message, and instead get an error message, first verify the paths to your files.

Running locally

Once you have verified that the Queue functions have been generated successfully, you can execute the pipeline by appending -run to the command line.

java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam -run

After a second you should see output that looks nearly identical to this:

INFO  12:03:49,791 QScriptManager - Compiling 1 QScript 
INFO  12:03:52,541 QScriptManager - Compilation complete 
INFO  12:03:52,610 HelpFormatter - --------------------------------------------------------------------------- 
INFO  12:03:52,610 HelpFormatter - Queue v1.3-266-g8e4b3f6, Compiled 2011/12/05 11:56:11 
INFO  12:03:52,610 HelpFormatter - Copyright (c) 2011 The Broad Institute 
INFO  12:03:52,610 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki 
INFO  12:03:52,610 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa 
INFO  12:03:52,611 HelpFormatter - Program Args: -S public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/ExampleCountReads.scala -R public/testdata/exampleFASTA.fasta -I public/testdata/exampleBAM.bam -run 
INFO  12:03:52,611 HelpFormatter - Date/Time: 2011/12/05 12:03:52 
INFO  12:03:52,611 HelpFormatter - --------------------------------------------------------------------------- 
INFO  12:03:52,611 HelpFormatter - --------------------------------------------------------------------------- 
INFO  12:03:52,614 QCommandLine - Scripting ExampleCountReads 
INFO  12:03:52,652 QCommandLine - Added 1 functions 
INFO  12:03:52,653 QGraph - Generating graph. 
INFO  12:03:52,785 QGraph - Running jobs. 
INFO  12:03:52,807 FunctionEdge - Starting:  'java'    '-Djava.io.tmpdir=/Users/droazen/Desktop/src/unstable/tmp'  '-cp' '/Users/droazen/Desktop/src/unstable/dist/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleBAM.bam'  '-R' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleFASTA.fasta'  
INFO  12:03:52,807 FunctionEdge - Output written to /Users/droazen/Desktop/src/unstable/Q-16868@bm00e-13f-1.out 
INFO  12:03:58,324 QGraph - 0 Pend, 1 Run, 0 Fail, 0 Done 
INFO  12:04:22,802 FunctionEdge - Done:  'java'    '-Djava.io.tmpdir=/Users/droazen/Desktop/src/unstable/tmp'  '-cp' '/Users/droazen/Desktop/src/unstable/dist/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'CountReads'  '-I' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleBAM.bam'  '-R' '/Users/droazen/Desktop/src/unstable/public/testdata/exampleFASTA.fasta'  
INFO  12:04:22,803 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done 
INFO  12:04:22,805 QScript - Script completed successfully with 1 total jobs 
INFO  12:04:22,807 QCommandLine - Writing JobLogging GATKReport to file Q-16868@bm00e-13f.jobreport.txt 
INFO  12:04:22,827 QCommandLine - Plotting JobLogging GATKReport to file Q-16868@bm00e-13f.jobreport.txt.pdf 
WARN  12:04:28,423 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info.

The results of the traversal will be written to a file in the current directory. The name of the file will be printed in the output, Q-16868@bm00e-13f-1.out in this example.

Running on a computing farm

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox