Tagged with #scala
3 documentation articles | 0 announcements | 15 forum discussions


Comments (0)

1. Install scala somewhere

At the Broad, we typically put it somewhere like this:

/home/radon01/depristo/work/local/scala-2.7.5.final

Next, create a symlink from this directory to trunk/scala/installation:

ln -s /home/radon01/depristo/work/local/scala-2.7.5.final trunk/scala/installation

2. Setting up your path

Right now the only way to get scala walkers into the GATK is by explicitly setting your CLASSPATH in your .my.cshrc file:

setenv CLASSPATH /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/FourBaseRecaller.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GenomeAnalysisTK.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/Playground.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/StingUtils.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/bcel-5.2.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/colt-1.2.0.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/google-collections-0.9.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/javassist-3.7.ga.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/junit-4.4.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/log4j-1.2.15.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/picard-1.02.63.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/picard-private-875.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/reflections-0.9.2.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/sam-1.01.63.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/simple-xml-2.0.4.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar:/humgen/gsa-scr1/depristo/local/scala-2.7.5.final/lib/scala-library.jar

Really this needs to be manually updated whenever any of the libraries are updated. If you see this error:

Caused by: java.lang.RuntimeException: java.util.zip.ZipException: error in opening zip file
        at org.reflections.util.VirtualFile.iterable(VirtualFile.java:79)
        at org.reflections.util.VirtualFile$5.transform(VirtualFile.java:169)
        at org.reflections.util.VirtualFile$5.transform(VirtualFile.java:167)
        at org.reflections.util.FluentIterable$3.transform(FluentIterable.java:43)
        at org.reflections.util.FluentIterable$3.transform(FluentIterable.java:41)
        at org.reflections.util.FluentIterable$ForkIterator.computeNext(FluentIterable.java:81)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:132)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:127)
        at org.reflections.util.FluentIterable$FilterIterator.computeNext(FluentIterable.java:102)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:132)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:127)
        at org.reflections.util.FluentIterable$TransformIterator.computeNext(FluentIterable.java:124)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:132)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:127)
        at org.reflections.Reflections.scan(Reflections.java:69)
        at org.reflections.Reflections.<init>(Reflections.java:47)
        at org.broadinstitute.sting.utils.PackageUtils.<clinit>(PackageUtils.java:23)

It's because the libraries aren't updated. Basically just do an ls of your trunk/dist directory after the GATK has been build, make this your classpath as above, and tack on:

/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar:/humgen/gsa-scr1/depristo/local/scala-2.7.5.final/lib/scala-library.jar

A command that almost works (but you'll need to replace the spaces with colons) is:

#setenv CLASSPATH $CLASSPATH `ls /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/*.jar` /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar:/humgen/gsa-scr1/depristo/local/scala-2.7.5.final/lib/scala-library.jar

3. Building scala code

All of the Scala source code lives in scala/src, which you build using ant scala

There are already some example Scala walkers in scala/src, so doing a standard checkout, installing scala, settting up your environment, should allow you to run something like:

gsa2 ~/dev/GenomeAnalysisTK/trunk > ant scala
Buildfile: build.xml

init.scala:

scala:
     [echo] Sting: Compiling scala!
   [scalac] Compiling 2 source files to /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/scala/classes
   [scalac] warning: there were deprecation warnings; re-run with -deprecation for details
   [scalac] one warning found
   [scalac] Compile suceeded with 1 warning; see the compiler output for details.
   [delete] Deleting: /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar
      [jar] Building jar: /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar

4. Invoking a scala walker

Until we can include Scala walkers along with the main GATK jar (avoiding the classpath issue too) you have to invoke your scala walkers using this syntax:

java -Xmx2048m org.broadinstitute.sting.gatk.CommandLineGATK -T BaseTransitionTableCalculator -R /broad/1KG/reference/human_b36_both.fasta -I /broad/1KG/DCC_merged/freeze5/NA12878.pilot2.SLX.bam -l INFO -L 1:1-100

Here, the BaseTransitionTableCalculator walker is written in Scala and being loaded into the system by the GATK walker manager. Otherwise everything looks like a normal GATK module.

Comments (2)

In addition to testing walkers individually, you may want to also run integration tests for your QScript pipelines.

1. Brief comparison to the Walker integration tests

  • Pipeline tests should use the standard location for testing data.
  • Pipeline tests use the same test dependencies.
  • Pipeline tests which generate MD5 results will have the results stored in the MD5 database].
  • Pipeline tests, like QScripts, are written in Scala.
  • Pipeline tests dry-run under the ant target pipelinetest and run under pipelinetestrun.
  • Pipeline tests class names must end in PipelineTest to run under the ant target.
  • Pipeline tests should instantiate a PipelineTestSpec and then run it via PipelineTest.exec().

2. PipelineTestSpec

When building up a pipeline test spec specify the following variables for your test.

Variable Type Description
args String The arguments to pass to the Queue test, ex: -S scala/qscript/examples/HelloWorld.scala
jobQueue String Job Queue to run the test. Default is null which means use hour.
fileMD5s Map[Path, MD5] Expected MD5 results for each file path.
expectedException classOf[Exception] Expected exception from the test.

3. Example PipelineTest

The following example runs the ExampleCountLoci QScript on a small bam and verifies that the MD5 result is as expected.

It is checked into the Sting repository under scala/test/org/broadinstitute/sting/queue/pipeline/examples/ExampleCountLociPipelineTest.scala

package org.broadinstitute.sting.queue.pipeline.examples

import org.testng.annotations.Test
import org.broadinstitute.sting.queue.pipeline.{PipelineTest, PipelineTestSpec}
import org.broadinstitute.sting.BaseTest

class ExampleCountLociPipelineTest {
  @Test
  def testCountLoci {
    val testOut = "count.out"
    val spec = new PipelineTestSpec
    spec.name = "countloci"
    spec.args = Array(
      " -S scala/qscript/examples/ExampleCountLoci.scala",
      " -R " + BaseTest.hg18Reference,
      " -I " + BaseTest.validationDataLocation + "small_bam_for_countloci.bam",
      " -o " + testOut).mkString
    spec.fileMD5s += testOut -> "67823e4722495eb10a5e4c42c267b3a6"
    PipelineTest.executeTest(spec)
  }
}

3. Running Pipeline Tests

Dry Run

To test if the script is at least compiling with your arguments run ant pipelinetest specifying the name of your class to -Dsingle:

ant pipelinetest -Dsingle=ExampleCountLociPipelineTest

Sample output:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour
   [testng]   => countloci PASSED DRY RUN
   [testng] PASSED: testCountLoci

Run

As of July 2011 the pipeline tests run against LSF 7.0.6 and Grid Engine 6.2u5. To include these two packages in your environment use the hidden dotkit .combined_LSF_SGE.

reuse .combined_LSF_SGE

Once you are satisfied that the dry run has completed without error, to actually run the pipeline test run ant pipelinetestrun.

ant pipelinetestrun -Dsingle=ExampleCountLociPipelineTest

Sample output:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] Checking MD5 for pipelinetests/countloci/run/count.out [calculated=67823e4722495eb10a5e4c42c267b3a6, expected=67823e4722495eb10a5e4c42c267b3a6]
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci

Generating initial MD5s

If you don't know the MD5s yet you can run the command yourself on the command line and then MD5s the outputs yourself, or you can set the MD5s in your test to "" and run the pipeline.

When the MD5s are blank as in:

spec.fileMD5s += testOut -> ""

You run:

ant pipelinetest -Dsingle=ExampleCountLociPipelineTest -Dpipeline.run=run

And the output will look like:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] PARAMETERIZATION[countloci]: file pipelinetests/countloci/run/count.out has md5 = 67823e4722495eb10a5e4c42c267b3a6, stated expectation is , equal? = false
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci

Checking MD5s

When a pipeline test fails due to an MD5 mismatch you can use the MD5 database to diff the results.

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### Updating MD5 file: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] Checking MD5 for pipelinetests/countloci/run/count.out [calculated=67823e4722495eb10a5e4c42c267b3a6, expected=67823e4722495eb10a5e0000deadbeef]
   [testng] ##### Test countloci is going fail #####
   [testng] ##### Path to expected   file (MD5=67823e4722495eb10a5e0000deadbeef): integrationtests/67823e4722495eb10a5e0000deadbeef.integrationtest
   [testng] ##### Path to calculated file (MD5=67823e4722495eb10a5e4c42c267b3a6): integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] ##### Diff command: diff integrationtests/67823e4722495eb10a5e0000deadbeef.integrationtest integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] FAILED: testCountLoci
   [testng] java.lang.AssertionError: 1 of 1 MD5s did not match.

If you need to examine a number of MD5s which may have changed you can briefly shut off MD5 mismatch failures by setting parameterize = true.

spec.parameterize = true
spec.fileMD5s += testOut -> "67823e4722495eb10a5e4c42c267b3a6"

For this run:

ant pipelinetest -Dsingle=ExampleCountLociPipelineTest -Dpipeline.run=run

If there's a match the output will resemble:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] PARAMETERIZATION[countloci]: file pipelinetests/countloci/run/count.out has md5 = 67823e4722495eb10a5e4c42c267b3a6, stated expectation is 67823e4722495eb10a5e4c42c267b3a6, equal? = true
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci

While for a mismatch it will look like this:

   [testng] --------------------------------------------------------------------------------
   [testng] Executing test countloci with Queue arguments: -S scala/qscript/examples/ExampleCountLoci.scala -R /seq/references/Homo_sapiens_assembly18/v0/Homo_sapiens_assembly18.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/small_bam_for_countloci.bam -o count.out -bsub -l WARN -tempDir pipelinetests/countloci/temp/ -runDir pipelinetests/countloci/run/ -jobQueue hour -run
   [testng] ##### MD5 file is up to date: integrationtests/67823e4722495eb10a5e4c42c267b3a6.integrationtest
   [testng] PARAMETERIZATION[countloci]: file pipelinetests/countloci/run/count.out has md5 = 67823e4722495eb10a5e4c42c267b3a6, stated expectation is 67823e4722495eb10a5e0000deadbeef, equal? = false
   [testng]   => countloci PASSED
   [testng] PASSED: testCountLoci
Comments (1)

1. What is Scala?

Scala is a combination of an object oriented framework and a functional programming language. For a good introduction see the free online book Programming Scala.

The following are extremely brief answers to frequently asked questions about Scala which often pop up when first viewing or editing QScripts. For more information on Scala there a multitude of resources available around the web including the Scala home page and the online Scala Doc.

2. Where do I learn more about Scala?

  • http://www.scala-lang.org
  • http://programming-scala.labs.oreilly.com
  • http://www.scala-lang.org/docu/files/ScalaByExample.pdf
  • http://devcheatsheet.com/tag/scala/
  • http://davetron5000.github.com/scala-style/index.html

3. What is the difference between var and val?

var is a value you can later modify, while val is similar to final in Java.

4. What is the difference between Scala collections and Java collections? / Why do I get the error: type mismatch?

Because the GATK and Queue are a mix of Scala and Java sometimes you'll run into problems when you need a Scala collection and instead a Java collection is returned.

   MyQScript.scala:39: error: type mismatch;
     found   : java.util.List[java.lang.String]
     required: scala.List[String]
        val wrapped: List[String] = TextFormattingUtils.wordWrap(text, width)

Use the implicit definitions in JavaConversions to automatically convert the basic Java collections to and from Scala collections.

import collection.JavaConversions._

Scala has a very rich collections framework which you should take the time to enjoy. One of the first things you'll notice is that the default Scala collections are immutable, which means you should treat them as you would a String. When you want to 'modify' an immutable collection you need to capture the result of the operation, often assigning the result back to the original variable.

var str = "A"
str + "B"
println(str) // prints: A
str += "C"
println(str) // prints: AC

var set = Set("A")
set + "B"
println(set) // prints: Set(A)
set += "C"
println(set) // prints: Set(A, C)

5. How do I append to a list?

Use the :+ operator for a single value.

  var myList = List.empty[String]
  myList :+= "a"
  myList :+= "b"
  myList :+= "c"

Use ++ for appending a list.

  var myList = List.empty[String]
  myList ++= List("a", "b", "c")

6. How do I add to a set?

Use the + operator.

  var mySet = Set.empty[String]
  mySet += "a"
  mySet += "b"
  mySet += "c"

7. How do I add to a map?

Use the + and -> operators.

  var myMap = Map.empty[String,Int]
  myMap += "a" -> 1
  myMap += "b" -> 2
  myMap += "c" -> 3

8. What are Option, Some, and None?

Option is a Scala generic type that can either be some generic value or None. Queue often uses it to represent primitives that may be null.

  var myNullableInt1: Option[Int] = Some(1)
  var myNullableInt2: Option[Int] = None

9. What is _ / What is the underscore?

François Armand's slide deck is a good introduction: http://www.slideshare.net/normation/scala-dreaded

To quote from his slides:

Give me a variable name but
- I don't care of what it is
- and/or
- don't want to pollute my namespace with it

10. How do I format a String?

Use the .format() method.

This Java snippet:

String formatted = String.format("%s %i", myString, myInt);

In Scala would be:

val formatted = "%s %i".format(myString, myInt)

11. Can I use Scala Enumerations as QScript @Arguments?

No. Currently Scala's Enumeration class does not interact with the Java reflection API in a way that could be used for Queue command line arguments. You can use Java enums if for example you are importing a Java based walker's enum type.

If/when we find a workaround for Queue we'll update this entry. In the meantime try using a String.

No posts found with the requested search criteria.
Comments (6)

I'm just learning to create Q scripts right now.

Currently I'm using CommandLineFunction to create the calls for bcl2fastq, bwa, samtools, and picard.

Right now I'm stuck trying to find a way to pass a system path to my script as an argument. I'm trying to run "cd /path/to/rundirectory/" followed by "bcl2fastq".

How do I pass the path to the run directory to the "cd" command?

Comments (6)

Right now, when I run "java -jar Queue.jar QScript.scala", scalac ("QScriptManager") complains that it can't find classes and objects that are in the same package as my QScript, as well as methods declared in my package object. When I try to import these explicitly, I get "object is not a member of package x".

How can I tell Queue to pass to scalac all of the files in my package?

I'm using Scala 2.11.4.

Thanks!

Comments (4)

Dear GATK-Team,

I am trying to add non-GATK software to my current Queue pipeline and have been following the Advanced Queue Usage. However, I get the following error when running my bash script and I don't see where this error is coming from. Do I fail to import a needed library? INFO 13:40:59,347 QScriptManager - Compiling 1 QScript ERROR 13:40:59,551 QScriptManager - map.scala:18: not found: type CommandLineFunction ERROR 13:40:59,555 QScriptManager - class RunBlast extends CommandLineFunction { ERROR 13:40:59,557 QScriptManager - ^

Here is my scala script: import org.broadinstitute.gatk.queue.QScript import org.broadinstitute.gatk.queue.extensions.gatk._ class RunBlast extends CommandLineFunction { @Input(doc="File to use as query") var queryFile: File = _ @Ouput(doc="File to write output results") var blastHits: File = _ @Argument(doc="BLAST algorithm to use") var algorithm: String = "blastn" @Argument(doc="Database to query against") var database: String = "nt" def commandLine = algorithm + " -db " + database + " -query " + queryFile + " -out " + blastHits } class ScriptToRunBlast extends QScript { def script() { val runBlast = new RunBlast runBlast.queryFile = new File("mySequence.fasta") runBlast.blastHits = new File("blastHits.out") add(runBlast) }

And my command line invocation: java -Xmx500M -jar $HOME"/src/Queue-3.2-2/Queue.jar" \ -S $HOME"/scripts/humanomics/scala/map.scala" \ -run \ -startFromScratch

Thanks in advance

Comments (8)

Dear GATK Team,

I am wondering about future plans for the Queue framework. I find it a useful framework to write and run pipelines in computing clusters. However, I found myself often wanting to use Queue in pipelines without any GATK walkers at all. Are there any plans in the future to release Queue as its own, GATK-independent package?

I know that internally there are some shared classes (e.g. the command line parser), and refactoring them so that Queue can be GATK-free may require a little more work. But I'm just interested to know if there are already plans to do this (or perhaps even already ongoing).

Cheers, konuva

Comments (2)

Is there a way in Queue to let a job executed locally instead of submitting it to the cluster with drmaa? I know this must be possible since the scatter jobs are also not submitted at the cluster, could only not find how to do this.

For example I have a sniplet of a job that can easily run local:

class Ln extends CommandLineFunction {
  @Input(doc="Input file") var in: File = _
  @Output(doc="Link destination") var out: File = _

  def commandLine = {
    "ln -sf " + required(in) + required(out)
  }
}

I hope you can help me out with this.

Comments (2)

Hi! I am happy to report that Queue and all the necessary tests for running GridEngine passed. The issue I am having is using a custom qscript to run a job in parallel. When I run the job on the cluster via qsub it runs in serial. Would someone be willing to look at my qsub syntax and my qscript to see if I am forgetting something?

The Qscript was a modified UnifiedGenotyper script configured to work with HaplotypeCaller: ` package org.broadinstitute.sting.queue.qscripts.examples

import org.broadinstitute.sting.queue.QScript
import org.broadinstitute.sting.queue.extensions.gatk._

class Haplotyper extends QScript {
  @Input(doc="The reference file for the bam files.", shortName="R")
  var referenceFile: File = _ // _ is scala shorthand for null

  @Input(doc="Bam file to genotype.", shortName="I")
  var bamFile: File = _

  @Input(doc="Output file.", shortName="o")
  var outputFile: File = _

  trait UnifiedGenotyperArguments extends CommandLineGATK {
    this.reference_sequence = qscript.referenceFile
    this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
    this.memoryLimit = 2
  }
  def script() {
   val genotyper = new HaplotypeCaller with UnifiedGenotyperArguments

  genotyper.scatterCount = 12
  genotyper.input_file :+= qscript.bamFile
  genotyper.out = swapExt(outputFile, qscript.bamFile, "bam", "vcf")

  add(genotyper) 
    }
}`

and my Queue syntax was: java -Djava.io.tmpdir=tmp -jar /location/of/queue/Queue.jar -S scripts/qscalascripts/haplotyper.scala -R human_g1k_v37 -I /source/input_file -o /destination/output/file -l debug -jobRunner GridEngine -run

When I use the above, the Queue script breaks up my job into 12 discrete pieces, but runs it all on one node on the cluster. Any pointers is most welcome.

Comments (7)

Hi,

Thanks to previous replies can run Queue and the relevant walker on a distributed computing server. The question was if I define my scala script to require an argument for the output file, using the -o parameter like so:

            // Required arguments.  All initialized to empty values.
            ....

             @Input(doc="Output file.", shortName="o")
                              var outputFile: File = _

How do I direct the output to pipe the result to a specified directory? Currently I have the code: genotyper.out = swapExt(qscript.bamFile, "bam", outputFile, "unfiltered.vcf")

Currently when I include the string -o /path/to/my/output/files/MyResearch.vcf

The script creates a series of folders within the directory where I execute Queue from. In this case my results were sent to: /Queue-2-8-1-g932cd3a/MyResearch./path/to/my/output/files/MyResearch.unfiltered.vcf

when all I wanted was the output to appear in the path: /path/to/my/output/files/MyResearch.unfiltered.vcf

As always any help is much appreciated.

Comments (3)

Hi all, I am running a scala script, and would like to the include "-ploidy 1". Any advice on how can I do this?

Some information that may or may not be relevant (idk!):

I am attempting this using UnifiedGenotyper (v2.7-4). At the top of the script I have: package org.broadinstitute.sting.queue.qscripts.examples import org.broadinstitute.sting.queue.QScript import org.broadinstitute.sting.queue.extensions.gatk._

I got the glm both to work using: genotyper.glm = org.broadinstitute.sting.gatk.walkers.genotyper.GenotypeLikelihoodsCalculationModel.Model.BOTH

I was hoping for something similar for ploidy.

Thanks, Rhys

Comments (15)

I am trying to build Queue from Sting package downloaded from Github, but the ant building process always fails with different errors. I wonder if there's any alternative way to build Queue. Is there any scala script available that I can study or customize for automating GATK runs?

Comments (2)

Hi, I am trying to assign queue name to individual bsub commands generated by Queue.jar. Basically I want it to generate

bsub -q short "the command"
instead of
bsub "the command"

Any advice on how to assign queue names, to submit to, from within the scala file would be really helpful. Thanks.

Comments (8)

I'm working on a set of related Queue scripts. I would like to have functionality shared between them, ideally in separate scala files which would be imported. Is there a way to specify additional paths for the Queue scala compiler to search or do I have to bake my library into the gatk when I build it?

Comments (1)

I've noticed some strange behavior from Queue where in some cases, when I scatter/gather the Unified Genotyper in indel-mode it will introduce Cycles in the graph. This causes to Queue to die with a StackOverflowError which seems to be caused by the graphDepth function in QGraph due to the recursion becoming unbounded. This cause me some headaches yesterday as I tried to figure out how to make the function tail-recursive, before noticing the message: ERROR 17:18:21,292 QGraph - Cycles were detected in the graph this morning.

This leads me to one request and one question. First the request: It would be nice if Queue would exit if the graph validation fails, as it would make identifying the source of the problem simpler. It this possible?

Secondly the question: do you have any ideas as to what might cause the cycles?

I have tried looking at the graphviz files and I cannot identify any cycles from those (though when looking at the s/g-plots it's really difficult to make any sense of it).

My code looks like this:

val candidateSnps = new File(outputDir + "/" + projectName + ".candidate.snp.vcf")
val candidateIndels = new File(outputDir + "/" + projectName + ".candidate.indel.vcf")

// SNP and INDEL Calls
add(snpCall(cohortList, candidateSnps))
add(indelCall(cohortList, candidateIndels))

val targets = new File(outputDir + "/" + projectName + ".targets")
add(target(candidateIndels, targets))

// Take regions based on indels called in previous step
val postCleaningBamList =
  for (bam <- cohortList) yield {
    val indelRealignedBam = swapExt(bam, ".bam", ".clean.bam")
    add(clean(Seq(bam), targets, indelRealignedBam))
    indelRealignedBam
  }

val afterCleanupSnps = swapExt(candidateSnps, ".candidate.snp.vcf", ".cleaned.snp.vcf")
val afterCleanupIndels = swapExt(candidateIndels, ".candidate.indel.vcf", ".cleaned.indel.vcf")

// Call snps/indels again
add(snpCall(postCleaningBamList, afterCleanupSnps))
add(indelCall(postCleaningBamList, afterCleanupIndels))

Where the cohortList is a Seq[File].

Right now I've solved this by setting this.scatterCount = 1 in the indelCall case class, however this doesn't feel quite satisfactory to me, so any pointers for a more robust solution would be greatly appreciated.

Comments (16)

I am running the following command to test my scala using GATK-2.3.5

> java -Xmx4g -Djava.io.tmpdir=tmp -jar /Queue-2.3-5-g49ed93c/Queue.jar -S Queue-2.3-5-g49ed93c/ExampleCountReads.scala -R /GATK-2.3-5-g49ed93c/resources/exampleFASTA.fasta -I /GATK-2.3-5-g49ed93c/resources/exampleBAM.bam

and I am getting this error

> INFO  13:42:55,166 QScriptManager - Compiling 1 QScript 
> ERROR 13:42:55,348 QScriptManager - ExampleCountReads.scala:39: in XML literal: '=' expected instead of 'G' 
> ERROR 13:42:55,356 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,356 QScriptManager -                           ^ 
> ERROR 13:42:55,357 QScriptManager - ExampleCountReads.scala:39: in XML literal: ' or " delimited attribute value or '{' scala-expr '}' expected 
> ERROR 13:42:55,358 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,358 QScriptManager -                            ^ 
> ERROR 13:42:55,358 QScriptManager - ExampleCountReads.scala:39: in XML literal: whitespace expected 
> ERROR 13:42:55,359 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,360 QScriptManager -                             ^ 
> ERROR 13:42:55,360 QScriptManager - ExampleCountReads.scala:39: in XML literal: '=' expected instead of '>' 
> ERROR 13:42:55,361 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,362 QScriptManager -                                               ^ 
> ERROR 13:42:55,362 QScriptManager - ExampleCountReads.scala:39: in XML literal: ' or " delimited attribute value or '{' scala-expr '}' expected 
> ERROR 13:42:55,363 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,365 QScriptManager -                                                 ^ 
> ERROR 13:42:55,366 QScriptManager - ExampleCountReads.scala:39: in XML literal: whitespace expected 
> ERROR 13:42:55,367 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,367 QScriptManager -                                                  ^ 
> ERROR 13:42:55,367 QScriptManager - ExampleCountReads.scala:39: in XML literal: '>' expected instead of ' ' 
> ERROR 13:42:55,369 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,369 QScriptManager -                                                   ^ 
> ERROR 13:42:55,369 QScriptManager - ExampleCountReads.scala:67: in XML literal: in XML content, please use '}}' to express '}' 
> ERROR 13:42:55,369 QScriptManager -   } 
> ERROR 13:42:55,369 QScriptManager -   ^ 
> ERROR 13:42:55,370 QScriptManager - ExampleCountReads.scala:39:  I encountered a '}' where I didn't expect one, maybe this tag isn't closed <WalkerName> 
> ERROR 13:42:55,371 QScriptManager -     // java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help 
> ERROR 13:42:55,371 QScriptManager -                                                     ^ 
> ERROR 13:42:55,371 QScriptManager - ExampleCountReads.scala:68: '}' expected but eof found. 
> ERROR 13:42:55,371 QScriptManager - } 
> ERROR 13:42:55,372 QScriptManager -  ^ 
> ERROR 13:42:55,390 QScriptManager - 10 errors found 
> ##### ERROR ------------------------------------------------------------------------------------------
> ##### ERROR stack trace 
> org.broadinstitute.sting.queue.QException: Compile of /medpop/mpg-psrl/Parabase/Tools/Queue-2.3-5-g49ed93c/ExampleCountReads.scala failed with 10 errors
>   at org.broadinstitute.sting.queue.QScriptManager.loadScripts(QScriptManager.scala:46)
>   at org.broadinstitute.sting.queue.QCommandLine.org$broadinstitute$sting$queue$QCommandLine$$qScriptPluginManager(QCommandLine.scala:94)
>   at org.broadinstitute.sting.queue.QCommandLine.getArgumentSources(QCommandLine.scala:225)
>   at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:197)
>   at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
>   at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:61)
>   at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
> ##### ERROR ------------------------------------------------------------------------------------------
> ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.3-5-g49ed93c):
> ##### ERROR
> ##### ERROR Please visit the wiki to see if this is a known problem
> ##### ERROR If not, please post the error, with stack trace, to the GATK forum
> ##### ERROR Visit our website and forum for extensive documentation and answers to 
> ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
> ##### ERROR
> ##### ERROR MESSAGE: Compile of /medpop/mpg-psrl/Parabase/Tools/Queue-2.3-5-g49ed93c/ExampleCountReads.scala failed with 10 errors
> ##### ERROR ------------------------------------------------------------------------------------------
> INFO  13:42:55,487 QCommandLine - Shutting down jobs. Please wait... 
> 13.462u 1.029s 0:10.41 139.0% 0+0k 0+0io 1pf+0w
> 

I am sure java, gatk and queue are installed properly and the files exist in the relevant directory. Thank you,

Comments (0)

Hi, folks,

I am trying my queue scala scripts, often get error message:

Conflicting collector combinations in option list; please refer to the release notes for the combinations allowed

However, from the debug information, the command line looks correct (?):

ERROR 12:07:05,460 FunctionEdge - Error: 'java' '-Xmx8192m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/Users/wxing/TPU/run/.queue/tmp' '-cp' '/Users/wxing/TPU/gatk/dist/Queue.jar' 'net.sf.picard.sam.SortSam' 'INPUT=/Users/wxing/TPU/run/SRR064286_1.fastq.aligned.sam' 'TMP_DIR=/Users/wxing/TPU/run/.queue/tmp' 'OUTPUT=/Users/wxing/TPU/run/SRR064286_1.fastq.aligned.bam' 'VALIDATION_STRINGENCY=SILENT' 'SO=coordinate' 'CREATE_INDEX=true'

Anyone had similar issues? any tricks for debugging?

Many thanks, Wei

Comments (13)

Hi there, I wanted to reproduce in my variant calling Queue script the same conditional you have in MethodsDevelopmenCallingPipeline, i.e. including InbreedingCoeff depending on the number of samples. However, in that script the number of samples is passed to the Target object as an integer, and I would like to count it from the bam file list passed as an input to the script.

Therefore I followed the method in DataProcessingPipeline, i.e.

import org.broadinstitute.sting.queue.util.QScriptUtils
[...]
@Input(doc="input BAM file - or list of BAM files", fullName="input", shortName="I", required=true)
var bamFile: File = _
[...]
val bamFilesList = QScriptUtils.createSeqFromFile(bamFile)
val sampleNo = bamFilesList.size

But unfortunately, despite DataProcessingPipeline works just fine, when I put these lines in my other script I get the following error:

INFO  12:48:08,616 HelpFormatter - Date/Time: 2012/11/08 12:48:08 
INFO  12:48:08,616 HelpFormatter - ---------------------------------------------------------------------- 
INFO  12:48:08,616 HelpFormatter - ---------------------------------------------------------------------- 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.sting.utils.exceptions.DynamicClassResolutionException: Could not create module HaplotypeCallerStep because      Cannot instantiate class (Invocation failure) caused by exception null
at org.broadinstitute.sting.utils.classloader.PluginManager.createByType(PluginManager.java:306)
at org.broadinstitute.sting.utils.classloader.PluginManager.createAllTypes(PluginManager.java:317)
at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:126)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)
at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.2-2-gf44cc4e):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Could not create module HaplotypeCallerStep because Cannot instantiate class (Invocation failure) caused by exception null
##### ERROR ------------------------------------------------------------------------------------------

I tried several alternatives looking at the imports in DataProcessingPipeline but maybe I am missing something. Could you please advise?

thanks very much Francesco