Tagged with #jobs
1 documentation article | 3 announcements | 1 forum discussion

Created 2012-08-15 17:07:32 | Updated 2014-04-02 16:12:09 | Tags: official jobs qfunction jobrunner advanced
Comments (11)

Implementing a Queue JobRunner

The following scala methods need to be implemented for a new JobRunner. See the implementations of GridEngine and LSF for concrete full examples.

1. class JobRunner.start()

Start should to copy the settings from the CommandLineFunction into your job scheduler and invoke the command via sh <jobScript>. As an example of what needs to be implemented, here is the current contents of the start() method in MyCustomJobRunner which contains the pseudo code.

  def start() {
    // TODO: Copy settings from function to your job scheduler syntax.

    val mySchedulerJob = new ...

    // Set the display name to 4000 characters of the description (or whatever your max is)
    mySchedulerJob.displayName = function.description.take(4000)

    // Set the output file for stdout
    mySchedulerJob.outputFile = function.jobOutputFile.getPath

    // Set the current working directory
    mySchedulerJob.workingDirectory = function.commandDirectory.getPath

    // If the error file is set specify the separate output for stderr
    if (function.jobErrorFile != null) {
      mySchedulerJob.errFile = function.jobErrorFile.getPath

    // If a project name is set specify the project name
    if (function.jobProject != null) {
      mySchedulerJob.projectName = function.jobProject

    // If the job queue is set specify the job queue
    if (function.jobQueue != null) {
      mySchedulerJob.queue = function.jobQueue

    // If the resident set size is requested pass on the memory request
    if (residentRequestMB.isDefined) {
      mySchedulerJob.jobMemoryRequest = "%dM".format(residentRequestMB.get.ceil.toInt)

    // If the resident set size limit is defined specify the memory limit
    if (residentLimitMB.isDefined) {
      mySchedulerJob.jobMemoryLimit = "%dM".format(residentLimitMB.get.ceil.toInt)

    // If the priority is set (user specified Int) specify the priority
    if (function.jobPriority.isDefined) {
      mySchedulerJob.jobPriority = function.jobPriority.get

    // Instead of running the function.commandLine, run "sh <jobScript>"
    mySchedulerJob.command = "sh " + jobScript

    // Store the status so it can be returned in the status method.
    myStatus = RunnerStatus.RUNNING

    // Start the job and store the id so it can be killed in tryStop
    myJobId = mySchedulerJob.start()

2. class JobRunner.status

The status method should return one of the enum values from org.broadinstitute.sting.queue.engine.RunnerStatus:

  • RunnerStatus.RUNNING
  • RunnerStatus.DONE
  • RunnerStatus.FAILED

3. object JobRunner.init()

Add any initialization code to the companion object static initializer. See the LSF or GridEngine implementations for how this is done.

4. object JobRunner.tryStop()

The jobs that are still in RunnerStatus.RUNNING will be passed into this function. tryStop() should send these jobs the equivalent of a Ctrl-C or SIGTERM(15), or worst case a SIGKILL(9) if SIGTERM is not available.

Running Queue with a new JobRunner

Once there is a basic implementation, you can try out the Hello World example with -jobRunner MyJobRunner.

java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S scala/qscript/examples/HelloWorld.scala -jobRunner MyJobRunner -run

If all goes well Queue should dispatch the job to your job scheduler and wait until the status returns RunningStatus.DONE and hello world should be echo'ed into the output file, possibly with other log messages.

See QFunction and Command Line Options for more info on Queue options.

Created 2015-08-17 20:52:37 | Updated | Tags: jobs hiring
Comments (2)

Much like the universe and my belt size, the GATK methods development team aims to grow ever larger.

The job titles on the docket this week are Data Scientist and Computational Biologist; the job description is below or on LinkedIn.

Apply your computational skills to solving the hardest problems in big-data genomics and have a wide impact on science and clinical practice, including cancer and other diseases. Join a lively team of data scientists and software engineers dedicated to creating the GATK (http://www.broadinstitute.org/gatk/), a widely used and successful software toolkit for applying next-generation DNA sequencing to medical genetics. The GATK team is an integral part of the Data Science and Data Engineering group at the Broad Institute, a research institution that is transforming medicine and human health by building software to organize, process, and visualize scientific data on an unprecedented scale. As part of this job, you will conceive and develop algorithms and analysis approaches to solve the key challenges for emerging DNA/RNA sequencing technologies, instantiating these ideas in reliable and scalable software tools that will be applied to scientific projects and used to inform clinical decisions, with revolutionary implications in medical and cancer genetics. You will apply computational techniques to design and implement analysis tools to solve complex computational and mathematical problems in genomics. You will work collaboratively with other data scientists on computational-biology research in a fast-paced environment. Your work will be expected to enable the research of other program scientists through excellent communication, teamwork, and a focus on creating usable and accessible research software tools. You must be capable of working in an interactive team environment while conducting self-directed research within broader goals set by group. NO EXPERIENCE WITH BIOLOGY IS REQUIRED.

Key Responsibilities

  • Devise new algorithms and approaches to genomic data analysis.
  • Rapidly prototype these ideas and validate their value on novel data sets.
  • Be involved in the effort to implement and optimize successful algorithms in the pipeline and for use by the broader community.
  • Gather information from, and present results to a broad range of non-computational staff.
  • Prepare written reports and presentations for internal use and publication.


  • PhD in Computer Science, Computational Biology, Bioinformatics, Mathematics, Physics or a related field is required.
  • Experience in computational biology or genomics is a plus.
  • Experience working on a team is a plus.
  • Excellent oral and written English communication skills Ability to solve complex problems individually and as part of a team Expertise within one of the following fields: Genetics, Genomics, Statistics, or Algorithm
  • Development Experience building substantial software projects in one or more modern programming languages

Broad Institute is an Equal Opportunity Employer

Created 2014-11-24 20:55:06 | Updated | Tags: jobs job-offer cancer
Comments (0)

We're advertising this job on behalf of our colleagues in the cancer analysis team. See the overview below the fold. If you're interested, please message me (@Geraldine_VdAuwera) or apply on the Broad's careers page (search for job requisition number 1591).

Please note that this job requires on-site presence (no remote work) and Broad cannot offer visa sponsorship for this opportunity.


Part-time opportunity to make a contribution to The Cancer Genome Atlas, a project with global impact in the search for a cure for cancer. Work with world-class researchers at the Broad Institute to publish large datasets generated by computational algorithms. Assist with development and maintenance of automated pipelines.

Part-time twice a week, on-site at the Broad Institute.

Characteristic Duties:

  • Assist cancer genome analysts in preparing data submissions to make available to the TCGA research community.
  • Generate automated data submissions for the TCGA research community.
  • Identify and address technical issues with automated pipelines.
  • Extend the functionality of automated pipelines.
  • Assist with documentation of automated pipelines.
  • Work with project managers to ensure that requirements and timelines are being met.
  • Work with engineers on process or design issues


  • Bachelor Degree or currently enrolled in a Bachelor Degree program. Computer Science, Computational Biology, or Bioinformatics preferred
  • Proficiency in linux or other unix flavor
  • Python experience a big plus
  • Knowledge of web services a plus
  • Prior involvement or interest in genomics a plus
  • Excellent communication skills

Created 2013-09-25 22:35:26 | Updated 2013-09-27 14:14:43 | Tags: official jobs job-offer compbio
Comments (0)

Yep, we're hiring again; we need even more compbios!

Do you dream of working on challenging problems, in a stimulating environment with plentiful resources? Or do you simply love the GATK and wish you could spend every waking moment playing with it? Good news! We have a position open for a computational biologist to join our team. You can find out more about us on this page. The job itself? In a nutshell, help develop and apply methods in NGS to medical genetics projects. You can find more details in the Broad Career Center -- look for Requisition Number #1107.

If you're interested (and who wouldn't be?) please message @rpoplin directly. That's Ryan Poplin, now leader of the Computational Methods Development group since Eric Banks became big chief of GSA last week.

We are of course happy to answer questions here in this thread about what it's like to work in the GSA group (awesome!) and how interesting it is to work on the bleeding edge of genome research (amazing!). Feel free to private-message us about it if you don't want your supervisor to know that you're thinking of jumping ship ;-)

Created 2013-12-20 20:40:37 | Updated | Tags: queue jobs
Comments (22)


I've been using Queue for pipelineing now for a little while, and have run in to an issue with the job report. The issue is that it's blank, after the pipeline has finished, the entire contents of the file is


I'm getting output from Queue saying that

 INFO  21:37:35,646 QJobsReporter - Writing JobLogging GATKReport to file /home/daniel.klevebring/bin/autoseqer/AutoSeqPipeline.jobreport.txt 

Any ideas why? The report would be a great tool, so I'm really interested in getting it to work properly.

Happy holidays,