Tagged with #QFunction
2 documentation articles | 0 announcements | 0 forum discussions



Created 2012-08-15 17:07:32 | Updated 2014-04-02 16:12:09 | Tags: official jobs qfunction jobrunner advanced
Comments (11)

Implementing a Queue JobRunner

The following scala methods need to be implemented for a new JobRunner. See the implementations of GridEngine and LSF for concrete full examples.

1. class JobRunner.start()

Start should to copy the settings from the CommandLineFunction into your job scheduler and invoke the command via sh <jobScript>. As an example of what needs to be implemented, here is the current contents of the start() method in MyCustomJobRunner which contains the pseudo code.

  def start() {
    // TODO: Copy settings from function to your job scheduler syntax.

    val mySchedulerJob = new ...

    // Set the display name to 4000 characters of the description (or whatever your max is)
    mySchedulerJob.displayName = function.description.take(4000)

    // Set the output file for stdout
    mySchedulerJob.outputFile = function.jobOutputFile.getPath

    // Set the current working directory
    mySchedulerJob.workingDirectory = function.commandDirectory.getPath

    // If the error file is set specify the separate output for stderr
    if (function.jobErrorFile != null) {
      mySchedulerJob.errFile = function.jobErrorFile.getPath
    }

    // If a project name is set specify the project name
    if (function.jobProject != null) {
      mySchedulerJob.projectName = function.jobProject
    }

    // If the job queue is set specify the job queue
    if (function.jobQueue != null) {
      mySchedulerJob.queue = function.jobQueue
    }

    // If the resident set size is requested pass on the memory request
    if (residentRequestMB.isDefined) {
      mySchedulerJob.jobMemoryRequest = "%dM".format(residentRequestMB.get.ceil.toInt)
    }

    // If the resident set size limit is defined specify the memory limit
    if (residentLimitMB.isDefined) {
      mySchedulerJob.jobMemoryLimit = "%dM".format(residentLimitMB.get.ceil.toInt)
    }

    // If the priority is set (user specified Int) specify the priority
    if (function.jobPriority.isDefined) {
      mySchedulerJob.jobPriority = function.jobPriority.get
    }

    // Instead of running the function.commandLine, run "sh <jobScript>"
    mySchedulerJob.command = "sh " + jobScript

    // Store the status so it can be returned in the status method.
    myStatus = RunnerStatus.RUNNING

    // Start the job and store the id so it can be killed in tryStop
    myJobId = mySchedulerJob.start()
  }

2. class JobRunner.status

The status method should return one of the enum values from org.broadinstitute.sting.queue.engine.RunnerStatus:

  • RunnerStatus.RUNNING
  • RunnerStatus.DONE
  • RunnerStatus.FAILED

3. object JobRunner.init()

Add any initialization code to the companion object static initializer. See the LSF or GridEngine implementations for how this is done.

4. object JobRunner.tryStop()

The jobs that are still in RunnerStatus.RUNNING will be passed into this function. tryStop() should send these jobs the equivalent of a Ctrl-C or SIGTERM(15), or worst case a SIGKILL(9) if SIGTERM is not available.

Running Queue with a new JobRunner

Once there is a basic implementation, you can try out the Hello World example with -jobRunner MyJobRunner.

java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S scala/qscript/examples/HelloWorld.scala -jobRunner MyJobRunner -run

If all goes well Queue should dispatch the job to your job scheduler and wait until the status returns RunningStatus.DONE and hello world should be echo'ed into the output file, possibly with other log messages.

See QFunction and Command Line Options for more info on Queue options.


Created 2012-08-11 00:07:18 | Updated 2015-03-04 19:04:02 | Tags: queue developer qfunction
Comments (9)

These are the most popular Queue command line options. For a complete and up to date list run with --help or -h. QScripts may also add additional command line options.

1. Queue Command Line Options

Command Line Argument Description Default
-run If passed the scripts are run. If not passed a dry run is executed. dry run
-jobRunner <jobrunner> The job runner to dispatch jobs. Setting to Lsf706, GridEngine, or Drmaa will dispatch jobs to LSF or Grid Engine using the job settings (see below). Defaults to Shell which runs jobs on a local shell one at a time. Shell
-bsub Alias for -jobRunner Lsf706 not set
-qsub Alias for -jobRunner GridEngine not set
-status Prints out a summary progress. If a QScript is currently running via -run, you can run the same command line with -status instead to print a summary of progress. not set
-retry <count> Retries a QFunction that returns a non-zero exit code up to count times. The QFunction must not have set jobRestartable to false. 0 = no retries
-startFromScratch Restarts the graph from the beginning. If not specified for each output file specified on a QFunction, ex: /path/to/output.file, Queue will not re-run the job if a .done file is found for the all the outputs, ex: /path/to/.output.file.done. use .done files to determine if jobs are complete
-keepIntermediates By default Queue deletes the output files of QFunctions that set .isIntermediate to true. delete intermediate files
-statusTo <email> Email address to send status to whenever a) A job fails, or b) Queue has run all the functions it can run and is exiting. not set
-statusFrom <email> Email address to send status emails from. user@local.domain
-dot <file> If set renders the job graph to a dot file. not rendered
-l <logging_level> The minimum level of logging, DEBUG, INFO, WARN, or FATAL. INFO
-log <file> Sets the location to save log output in addition to standard out. not set
-debug Set the logging to include a lot of debugging information (SLOW!) not set
-jobReport Path to write the job report text file. If R is installed and available on the $PATH then a pdf will be generated visualizing the job report. jobPrefix.jobreport.txt
-disableJobReport Disables writing the job report. not set
-help Lists all of the command line arguments with their descriptions. not set

2. QFunction Options

The following options can be specified on the command line over overridden per QFunction.

Command Line Argument QFunction Property Description Default
-jobPrefix .jobName The unique name of the job. Used to prefix directories and log files. Use -jobNamePrefix on the Queue command line to replace the default prefix Q-<processid>@<host>. <jobNamePrefix>-<jobNumber>
N/A .jobOutputFile Captures stdout and if jobErrorFile is null it captures stderr as well. <jobName>.out
N/A .jobErrorFile If not null captures stderr. null
N/A .commandDirectory The directory to execute the command line from. current directory
-jobProject .jobProject The project name for the job. default job runner project
-jobQueue .jobQueue The queue to dispatch the job. default job runner queue
-jobPriority .jobPriority The dispatch priority for the job. Lowest priority = 0. Highest priority = 100. default job runner priority
-jobNative .jobNativeArgs Native args to pass to the job runner. Currently only supported in GridEngine and Drmaa. The string is concatenated to the native arguments passed over DRMAA. Example: -w n. none
-jobResReq .jobResourceRequests Resource requests to pass to the job runner. On GridEngine this is multiple -l <req>. On LSF a single -R <req> is generated. memory reservations and limits on LSF and GridEngine
-jobEnv .jobEnvironmentNames Predefined environment names to pass to the job runner. On GridEngine this is -pe <env>. On LSF this is -a <env>. none
-memLimit .memoryLimit The memory limit for the job in gigabytes. Used to populate the variables residentLimit and residentRequest which can also be set separately. default job runner memory limit
-resMemLimit .residentLimit Limit for the resident memory in gigabytes. On GridEngine this is -l mem_free=<mem>. On LSF this is -R rusage[mem=<mem>]. memoryLimit * 1.2
-resMemReq .residentRequest Requested amount of resident memory in gigabytes. On GridEngine this is -l h_rss=<mem>. On LSF this is -R rusage[select=<mem>]. memoryLimit

3. Email Status Options

Command Line Argument Description Default
-emailHost <hostname> SMTP host name localhost
-emailPort <port> SMTP port 25
-emailTLS If set uses TLS. not set
-emailSSL If set uses SSL. not set
-emailUser <username> If set along with emailPass or emailPassFile authenticates the email with this username. not set
-emailPassFile <file> If emailUser is also set authenticates the email with contents of the file. not set
-emailPass <password> If emailUser is also set authenticates the email with this password. NOT SECURE: Use emailPassFile instead! not set
No posts found with the requested search criteria.
No posts found with the requested search criteria.