Tagged with #scattergatter
0 documentation articles | 0 announcements | 1 forum discussion


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (1)

I've noticed some strange behavior from Queue where in some cases, when I scatter/gather the Unified Genotyper in indel-mode it will introduce Cycles in the graph. This causes to Queue to die with a StackOverflowError which seems to be caused by the graphDepth function in QGraph due to the recursion becoming unbounded. This cause me some headaches yesterday as I tried to figure out how to make the function tail-recursive, before noticing the message: ERROR 17:18:21,292 QGraph - Cycles were detected in the graph this morning.

This leads me to one request and one question. First the request: It would be nice if Queue would exit if the graph validation fails, as it would make identifying the source of the problem simpler. It this possible?

Secondly the question: do you have any ideas as to what might cause the cycles?

I have tried looking at the graphviz files and I cannot identify any cycles from those (though when looking at the s/g-plots it's really difficult to make any sense of it).

My code looks like this:

val candidateSnps = new File(outputDir + "/" + projectName + ".candidate.snp.vcf")
val candidateIndels = new File(outputDir + "/" + projectName + ".candidate.indel.vcf")

// SNP and INDEL Calls
add(snpCall(cohortList, candidateSnps))
add(indelCall(cohortList, candidateIndels))

val targets = new File(outputDir + "/" + projectName + ".targets")
add(target(candidateIndels, targets))

// Take regions based on indels called in previous step
val postCleaningBamList =
  for (bam <- cohortList) yield {
    val indelRealignedBam = swapExt(bam, ".bam", ".clean.bam")
    add(clean(Seq(bam), targets, indelRealignedBam))
    indelRealignedBam
  }

val afterCleanupSnps = swapExt(candidateSnps, ".candidate.snp.vcf", ".cleaned.snp.vcf")
val afterCleanupIndels = swapExt(candidateIndels, ".candidate.indel.vcf", ".cleaned.indel.vcf")

// Call snps/indels again
add(snpCall(postCleaningBamList, afterCleanupSnps))
add(indelCall(postCleaningBamList, afterCleanupIndels))

Where the cohortList is a Seq[File].

Right now I've solved this by setting this.scatterCount = 1 in the indelCall case class, however this doesn't feel quite satisfactory to me, so any pointers for a more robust solution would be greatly appreciated.