Tagged with #development
0 documentation articles | 0 announcements | 13 forum discussions

No articles to display.

No articles to display.

Created 2016-01-20 16:19:18 | Updated | Tags: development java

Comments (2)

I'm playing with the GATK jar and I'm trying to use it as an eclipse plugin: that is to say that the GenomeAnalysisTK.jar is itself packaged/bundled in a parent jar. Could this affect the way GATK is finding the walker by name ?

args.toString() is "[-T, CombineVariants, -R, /home/lindenb/src/gatk-ui/testdata/ref.fa, -genotypeMergeOptions, UNIQUIFY, -o, /home/lindenb/jeter.vcf, --variant, /home/lindenb/src/gatk-ui/testdata/S3.vcf.gz, --variant, /home/lindenb/src/gatk-ui/testdata/S4.vcf.gz, --variant, /home/lindenb/src/gatk-ui/testdata/S1.vcf.gz, --variant, /home/lindenb/src/gatk-ui/testdata/S2.vcf.gz]"


org.broadinstitute.gatk.utils.exceptions.UserException$MalformedWalkerArgumentsException: Invalid command line: Malformed walker argument: Could not find walker with name: CombineVariants
    at org.broadinstitute.gatk.engine.WalkerManager.createMalformedArgumentException(WalkerManager.java:165)
    at org.broadinstitute.gatk.utils.classloader.PluginManager.createByName(PluginManager.java:280)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getWalkerByName(GenomeAnalysisEngine.java:331)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.getArgumentSources(CommandLineExecutable.java:210)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:205)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)

if yes, is there a way to tell change classLoader ?

Created 2015-11-13 19:02:51 | Updated | Tags: commandlinegatk developer development customwalker maven

Comments (2)


Because the thread for distribution of custom walkers is retired and I found another thread that does not help me in the issue that I have, I would like to ask this question again.

I'm planning to write custom walkers for the GATK public framework (using a Maven dependency with jitpack.io) and distribute that software including the custom walkers as a jar file with dependencies, where the only available walkers from the command line will be the custom ones. Although in the logging for my command line will be the version of the GATK framework in use, I would like to use a different logging for the beginning of the program and the error (instead of the link to the GATK webpage), so I will need to create and/or extend classes from the GATK engine (for the command line).

Of course, the software will be open source (MIT license), but I wonder if it could be possible to distribute a software in such a way. In addition, I would like to know if it is possible for the developers to made a tutorial on how to start the GATK engine with a custom walker without using the GATK command line.

Thank you very much in advance.

Created 2014-03-03 19:53:33 | Updated | Tags: development customwalker

Comments (2)

I was wondering how other people package and manage any custom walkers they write? I wrote my first walker last week, and am now trying to figure out the best way to make it available to my team and keep it up to date with new versions of GATK. I do my development on a laptop and analysis on a cluster, so I have to sync the code somehow. I can think of two possibilities, it's unclear to me which is best:

Package a Monolithic jar
Pros: Makes distribution very easy, guarantees that there are never version compatibilities
Cons: Well, I haven't been able to do it yet. I can't quite figure out how to supply a package .xml file that will import my classes into GenomeAnalysisTK.jar (or Queue.jar, for that matter). Also, I think the version number will change when I customize it, and customers/collaborators may get confused about a non-standard version number (if they notice)

Create a "Custom Walker" jar
Pros: I know how to do it. It won't change the GATK version, so the log files won't look different
Cons: I'll have to remember to include my jar on the classpath (or set an environment variable, I suppose). It's possible to encounter version incompatibilities, for instance if I want to test 3.0 but use 2.8 for production

Now that I think it through, maybe the cons of the custom jar aren't as bad as I thought. Does anyone else have any experience with Custom Walker maintenance/deployment? How did you do it?

Created 2014-02-07 16:02:40 | Updated | Tags: packaging development rodwalker

Comments (2)

Is there anyway to generate a jar file to execute a single walker that I wrote?

Created 2013-06-25 15:41:09 | Updated | Tags: unifiedgenotyper queue scala development scattergatter

Comments (1)

I've noticed some strange behavior from Queue where in some cases, when I scatter/gather the Unified Genotyper in indel-mode it will introduce Cycles in the graph. This causes to Queue to die with a StackOverflowError which seems to be caused by the graphDepth function in QGraph due to the recursion becoming unbounded. This cause me some headaches yesterday as I tried to figure out how to make the function tail-recursive, before noticing the message: ERROR 17:18:21,292 QGraph - Cycles were detected in the graph this morning.

This leads me to one request and one question. First the request: It would be nice if Queue would exit if the graph validation fails, as it would make identifying the source of the problem simpler. It this possible?

Secondly the question: do you have any ideas as to what might cause the cycles?

I have tried looking at the graphviz files and I cannot identify any cycles from those (though when looking at the s/g-plots it's really difficult to make any sense of it).

My code looks like this:

val candidateSnps = new File(outputDir + "/" + projectName + ".candidate.snp.vcf")
val candidateIndels = new File(outputDir + "/" + projectName + ".candidate.indel.vcf")

// SNP and INDEL Calls
add(snpCall(cohortList, candidateSnps))
add(indelCall(cohortList, candidateIndels))

val targets = new File(outputDir + "/" + projectName + ".targets")
add(target(candidateIndels, targets))

// Take regions based on indels called in previous step
val postCleaningBamList =
  for (bam <- cohortList) yield {
    val indelRealignedBam = swapExt(bam, ".bam", ".clean.bam")
    add(clean(Seq(bam), targets, indelRealignedBam))

val afterCleanupSnps = swapExt(candidateSnps, ".candidate.snp.vcf", ".cleaned.snp.vcf")
val afterCleanupIndels = swapExt(candidateIndels, ".candidate.indel.vcf", ".cleaned.indel.vcf")

// Call snps/indels again
add(snpCall(postCleaningBamList, afterCleanupSnps))
add(indelCall(postCleaningBamList, afterCleanupIndels))

Where the cohortList is a Seq[File].

Right now I've solved this by setting this.scatterCount = 1 in the indelCall case class, however this doesn't feel quite satisfactory to me, so any pointers for a more robust solution would be greatly appreciated.

Created 2013-06-10 23:32:01 | Updated | Tags: pileupwalker development

Comments (5)

Hello, I am writing my own walker and am currently using: rawContext.getPileup().getPileupForSample('sample_name') to generate pileups specific to a given sample. However, can I split the pileup by bam tag instead? Thank you.

Created 2013-03-12 14:01:12 | Updated | Tags: test license development resources

Comments (3)

I have two licence questions. First of:

There are two licence notes in BaseTest.java. I'm assuming that the top one is the one that should be viewed as currently in use. Is this correct?

Secondly, under which licence are the test resources made available. Am I'm free to use and redistribute these? I'm asking since I'm in the process of refactoring my pipeline project to use GATK as a external dependency, but some of the tests I've written use the files provided with the GATK as a resource, and I'd like to keep them and distribute them with the project as I've done before in my GATK fork.

Created 2013-03-05 14:36:50 | Updated | Tags: development

Comments (5)

I'm struggling somewhat with updating my own code to the 2.4 code base, and I would be very happy for any help you might offer on how to do this. First of all, I guess I should switch my upstream repository to https://github.com/broadgsa/gatk-protected (as I'm working in an academic not-for-profit environment).

What I would like to do is to rebase my own devel branch on the master branch of the gatk-protected repository. But trying to do so gives me a lot of conflicts in files that I have not altered in my own branch. Is resolving all of these conflicts one by one the only way to do this, or have I really got some problem with my understanding of how this is done? I'm guessing the later.

Either way it would be really nice to get some pointers on how to figure this one out.

Thanks, Johan

Created 2013-03-04 16:39:32 | Updated | Tags: developer development

Comments (1)

So I started to play with the GATK API to annotate my VCF. :-)

My first simple aim is to translate mt VCFBigWig to the GATK . Here is my code so far (see below)

I compiled it with:

javac -cp /home/lindenb/package/GenomeAnalysisTK-2.4-3-g2a7af43/GenomeAnalysisTK.jar:/home/lindenb/package/bigwig-read-only/dist/BigWig.jar:/home/lindenb/package/bigwig-read-only/lib/log4j-1.2.15.jar -sourcepath src/main/java \

(it is compiled but there is a bunch of warnings like "/home/lindenb/package/GenomeAnalysisTK-2.4-3-g2a7af43/GenomeAnalysisTK.jar(org/broadinstitute/sting/gatk/GenomeAnalysisEngine.class): warning: Cannot find annotation method 'value()' in type 'Ensures': class file for com.google.java.contract.Ensures not found"

And I tried to run my code using various invocations of java, like

java  -cp /home/lindenb/package/GenomeAnalysisTK-2.4-3-g2a7af43/GenomeAnalysisTK.jar:/home/lindenb/package/bigwig-read-only/dist/BigWig.jar:/home/lindenb/package/bigwig-read-only/lib/log4j-1.2.15.jar:. org.broadinstitute.sting.gatk.CommandLineGATK \
            -T VariantAnnotator \
            -A com.github.lindenb.gatk.tools.vcfbigwig.VCFBigWig \
            -R /test.fa test.vcf.gz

but it raises an exeception

    at org.broadinstitute.sting.gatk.walkers.annotator.VariantAnnotatorEngine.initializeAnnotations(VariantAnnotatorEngine.java:116)
Caused by: java.lang.NullPointerException
    at org.broadinstitute.sting.utils.classloader.JVMUtils.isAnonymous(JVMUtils.java:91)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:158)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:107)
    at org.broadinstitute.sting.gatk.walkers.annotator.interfaces.AnnotationInterfaceManager.<clinit>(AnnotationInterfaceManager.java:34)
    ... 9 more


  • What the proper to declare my class VCFBigWig , how can I make it available for the VariantAnnotatorEngine ?
  • I used the @Argument annotation, is it the right way to catch an argument from the command line?
  • Can I use the same Annotator for one GATK invocation but with different arguments ?
  • there is a "initialize" method; Is there a "dispose" method to release the resources ?
  • what is the interface "RodRequiringAnnotation" ?
  • is "UserException" the right way to send an error ?
  • Should an InfoFieldAnnotation or a Walker be thread free (like a HttpServlet) ? I mean can I safely use class members ?
  • I saw in VariantAnnotatorEngine that an annotator can either change the ID or the INFO. How should I design my class in order to alter both fields ? For example my tool VCFTabix with use another VCF file (e.g: the VCF for 1KGenomes) to update the ID and the INFO fields.

Thank you for your help,



package com.github.lindenb.gatk.tools.vcfbigwig;

import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;

import org.broad.igv.bbfile.BBFileReader;
import org.broad.igv.bbfile.BigWigIterator;
import org.broad.igv.bbfile.WigItem;
import org.broadinstitute.sting.commandline.Argument;
import org.broadinstitute.sting.gatk.GenomeAnalysisEngine;
import org.broadinstitute.sting.gatk.contexts.AlignmentContext;
import org.broadinstitute.sting.gatk.contexts.ReferenceContext;
import org.broadinstitute.sting.gatk.refdata.RefMetaDataTracker;
import org.broadinstitute.sting.gatk.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.sting.gatk.walkers.annotator.interfaces.InfoFieldAnnotation;
import org.broadinstitute.sting.gatk.walkers.annotator.interfaces.RodRequiringAnnotation;
import org.broadinstitute.sting.utils.exceptions.UserException;
import org.broadinstitute.sting.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.variant.variantcontext.VariantContext;
import org.broadinstitute.variant.vcf.VCFHeaderLine;
import org.broadinstitute.variant.vcf.VCFHeaderLineType;
import org.broadinstitute.variant.vcf.VCFInfoHeaderLine;

public class VCFBigWig
extends InfoFieldAnnotation 
implements RodRequiringAnnotation
/** PATH to the bigwig file */
@Argument(fullName="bigWigFile",shortName="bw",doc="BigWig to process",required=true)
private String bigWigFile=null;
/** the ID in the INFO field */
@Argument(fullName = "bigwigid", shortName = "bwid", doc="bigwig ID ", required=true)
private String bigWigId=null;
/** BBFileReader to the bigwig file */
private BBFileReader bbFileReader=null;

public VCFBigWig()

public void initialize(AnnotatorCompatible walker,
            GenomeAnalysisEngine toolkit, Set<VCFHeaderLine> headerLines) {
    super.initialize(walker, toolkit, headerLines);
    /* open the BIGFile, check it is a bigwig */
    try {
        this.bbFileReader=new BBFileReader(bigWigFile);
    catch (IOException e) {
        throw new UserException("Cannot read "+bigWigFile,e);
        throw new UserException("The following file is not a BigWig File : "+bigWigFile);

public Map<String, Object> annotate(
        RefMetaDataTracker tracker,
        AnnotatorCompatible walker,
        ReferenceContext ref,
        Map<String, AlignmentContext> stratifiedContexts,
        VariantContext vc,
        Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap)
    return annotate(tracker, walker,ref,stratifiedContexts,vc);

public Map<String, Object> annotate(
        RefMetaDataTracker tracker,
        AnnotatorCompatible walker,
        ReferenceContext ref,
        Map<String, AlignmentContext> stratifiedContexts,
        VariantContext vc)
    final boolean contained=true;
    double sum=0;
    int count=0;
    BigWigIterator iter=this.bbFileReader.getBigWigIterator(
            vc.getChr(), vc.getStart()-1,
            vc.getChr(), vc.getEnd(),

        WigItem item=iter.next();
        float v=item.getWigValue();

        return Collections.emptyMap();
    Map<String,Object> m=new HashMap<String, Object>(1);
    m.put(this.bigWigId, (float)(sum/count));
    return m ;
public List<VCFInfoHeaderLine> getDescriptions()
    return Arrays.asList(new VCFInfoHeaderLine[]{
        new VCFInfoHeaderLine(bigWigId,1, VCFHeaderLineType.Float,"Mean values for bigwig file "+this.bigWigFile)   
public List<String> getKeyNames() {
    return Arrays.asList(new String[]{


Created 2013-02-28 10:32:41 | Updated | Tags: vcf developer development

Comments (3)

I just quickly wrote a set of Tools to annotate my VCFs ( http://plindenbaum.blogspot.fr/2013/02/4-tools-i-wrote-today-to-annotate-vcf.html )

For example, one of those tools uses a BED/XML file indexed with tabix to annotate my VCF . (My code just uses the java api for tabix to get the XML at a given position)

Question: is there something in the GATK-API that would allow me to implement my code using the GATK-API: What kind of walker should I use ? What would be the benefits of using the GATK-API ? for example does using a gatk-walker will automatically make my code parallelizable ?


Created 2013-02-01 10:32:07 | Updated 2013-02-01 10:32:52 | Tags: walkers development

Comments (4)

Hi All, In my desparate attempts to learn more about how the GATK works, but since it's written in JAVA, which I have almost no clue about (python all the way), so to be honest I've not the faintest idea how to go about troubleshooting the below error.

I found a blog post by the ever wonderful Pierre Lindebaum here that went through compiling and running you're first GATK walker.

First using Pierre's example (after some compiling issues as there doesn't seem to be a ReadMetaDataTracker class anymore and I replace with RefMetaDataTracker) I get the following error:

java -cp /path/to/GenomeAnalysisTK.jar:/path/to/cofoja-1.0-r139.jar:/path/to/HelloRead.jar org.broadinstitute.sting.gatk.CommandLineGATK -T HelloRead -I path/to/test.bam -R /path/to/human_g1k_v37.fasta
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
    at org.broadinstitute.sting.utils.classloader.JVMUtils.isAnonymous(JVMUtils.java:91)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:155)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:124)
    at org.broadinstitute.sting.gatk.WalkerManager.<init>(WalkerManager.java:55)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.<init>(GenomeAnalysisEngine.java:160)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.<init>(CommandLineExecutable.java:53)
    at org.broadinstitute.sting.gatk.CommandLineGATK.<init>(CommandLineGATK.java:57)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.2-16-g9f648cb):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

I thought that was a very strange error and in Pierre's example he used GATK version 1.4 where as I am on 2.2.

I also tried the following

java -cp /path/to/cofoja-1.0-r139.jar:/path/to/HelloRead.jar -jar /path/to/GenomeAnalysisTK.jar -T HelloRead -I path/to/test.bam -R /path/to/human_g1k_v37.fasta
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-16-g9f648cb): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: HelloRead
##### ERROR ------------------------------------------------------------------------------------------

I have not tried downloading on older version of the GATK and running it with that. I'm pretty much stuck here. Any help is greatly appreciated.

Cheers, Davy

Created 2013-01-31 16:12:22 | Updated | Tags: development

Comments (1)

I've been looking all over the website, but I can't find the location of the source code to download it. I am interested in understanding GATK more in depth and possibly developing walkers for it. Thanks.

Created 2012-12-03 13:39:03 | Updated | Tags: development gatkextensionsgenerator extensions

Comments (2)

I'm having issues with the GATKExtensionsGenerator. When I'm trying to build the gatk with my own extentions I get the following error error message:

    [mkdir] Created dir: /home/MOLMED/dahljo/workspace/gatk/build/queue-extensions/src
     [echo] Generating Queue GATK extensions...
     [java] Exception in thread "main" java.lang.ExceptionInInitializerError
     [java]     at org.broadinstitute.sting.queue.extensions.gatk.GATKExtensionsGenerator.<init>(GATKExtensionsGenerator.java:92)
     [java]     at org.broadinstitute.sting.queue.extensions.gatk.GATKExtensionsGenerator.main(GATKExtensionsGenerator.java:103)
     [java] Caused by: java.lang.NullPointerException
     [java]     at org.reflections.Reflections.scan(Reflections.java:220)
     [java]     at org.reflections.Reflections.scan(Reflections.java:166)
     [java]     at org.reflections.Reflections.<init>(Reflections.java:94)
     [java]     at org.broadinstitute.sting.utils.classloader.PluginManager.<clinit>(PluginManager.java:77)
     [java]     ... 2 more

I guess that this is because the GATKExtensionsGenerator tries to generate a extension for the extension that I've written. However, as far as I understand the following lines in the GATKExtensionsGenerator: PluginManager<CommandLineProgram> clpManager = new PluginManager<CommandLineProgram>(CommandLineProgram.class, "CommandLineProgram", "CLP"); This should only be done for classes extending the CommandLineProgram class, which my program does not.

My program is a simple stand alone Scala program for splitting files (e.g. fastq and bams), and I've written a extension class for it which looks like this:

package se.uu.medsci.queue.ext

import java.io.File
import org.broadinstitute.sting.queue.function.JavaCommandLineFunction
import org.broadinstitute.sting.commandline._

class Splitter extends JavaCommandLineFunction {

    analysisName = "Splitter"
    javaMainClass = "se.uu.medsci.splitter.Splitter"

    @Input(doc = "The input file to be split", shortName = "input", fullName = "input_bam_files", required = true)
    var input: File = _

    @Output(doc = "Directory to output the split files to-", shortName = "o", fullName = "output_dir", required = true)
    var output: File = _

    @Argument(doc = "", shortName = "ds", fullName = "read_group_description", required = false)
    var recordsPerOutputFile: Int = -1

    override def freezeFieldValues() {
        if (recordsPerOutputFile == -1)
            recordsPerOutputFile = 100000

    override def commandLine = super.commandLine +
        required("-i", input) +
        required("-d", output) +
        optional("-r", recordsPerOutputFile)

As I think that this should have nothing to do with the GATKExtensionsGenerator I'm a bit confused about what might cause this error. Any ideas?