I am trying to run the GATK variant detection pipeline on 112 stickleback samples. I am using a GridEngine queue to parallelize this across our different machines. I have previously run the same code on a subset of the samples (55) and it worked fine. However, when I have tried to run on the full 112, I have run into some strange errors. In particular, things like:
commlib returns can't find connection WARN 13:58:57,655 DrmaaJobRunner - Unable to determine status of job id 4970049 org.ggf.drmaa.DrmCommunicationException: failed receiving gdi request response for mid=19906 (can't find connection). at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:391) at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:381) at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.getJobProgramStatus(JnaSession.java:155) at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner.liftedTree2$1(DrmaaJobRunner.scala:101) at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner.updateJobStatus(DrmaaJobRunner.scala:100) at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:55) at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:55) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:123) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:322) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:322) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:322) at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobManager.updateStatus(DrmaaJobManager.scala:55) at org.broadinstitute.sting.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1076) at org.broadinstitute.sting.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1068) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at org.broadinstitute.sting.queue.engine.QGraph.updateStatus(QGraph.scala:1068) at org.broadinstitute.sting.queue.engine.QGraph.runJobs(QGraph.scala:442) at org.broadinstitute.sting.queue.engine.QGraph.run(QGraph.scala:131) at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:127) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
crop up, followed by something like:
error: smallest event number 108 is greater than number 1 i'm waiting for
Does anyone have any idea of what might be going wrong? Either way, do you have any suggestions to help me move forward?
As a note, I have not tried running the 55 again, so it is possible that this would also now fail. In other words, I don't know whether the problem is due to some difference between the 55 and 112 sets, or if some part of the GATK that has been updated in the interim has introduced the problem. I can try running the original set again if it would be helpful.