Tagged with #development
0 documentation articles | 0 announcements | 11 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (2)

I was wondering how other people package and manage any custom walkers they write? I wrote my first walker last week, and am now trying to figure out the best way to make it available to my team and keep it up to date with new versions of GATK. I do my development on a laptop and analysis on a cluster, so I have to sync the code somehow. I can think of two possibilities, it's unclear to me which is best:

Package a Monolithic jar
Pros: Makes distribution very easy, guarantees that there are never version compatibilities
Cons: Well, I haven't been able to do it yet. I can't quite figure out how to supply a package .xml file that will import my classes into GenomeAnalysisTK.jar (or Queue.jar, for that matter). Also, I think the version number will change when I customize it, and customers/collaborators may get confused about a non-standard version number (if they notice)

Create a "Custom Walker" jar
Pros: I know how to do it. It won't change the GATK version, so the log files won't look different
Cons: I'll have to remember to include my jar on the classpath (or set an environment variable, I suppose). It's possible to encounter version incompatibilities, for instance if I want to test 3.0 but use 2.8 for production

Now that I think it through, maybe the cons of the custom jar aren't as bad as I thought. Does anyone else have any experience with Custom Walker maintenance/deployment? How did you do it?

Comments (2)

Is there anyway to generate a jar file to execute a single walker that I wrote?

Comments (1)

I've noticed some strange behavior from Queue where in some cases, when I scatter/gather the Unified Genotyper in indel-mode it will introduce Cycles in the graph. This causes to Queue to die with a StackOverflowError which seems to be caused by the graphDepth function in QGraph due to the recursion becoming unbounded. This cause me some headaches yesterday as I tried to figure out how to make the function tail-recursive, before noticing the message: ERROR 17:18:21,292 QGraph - Cycles were detected in the graph this morning.

This leads me to one request and one question. First the request: It would be nice if Queue would exit if the graph validation fails, as it would make identifying the source of the problem simpler. It this possible?

Secondly the question: do you have any ideas as to what might cause the cycles?

I have tried looking at the graphviz files and I cannot identify any cycles from those (though when looking at the s/g-plots it's really difficult to make any sense of it).

My code looks like this:

val candidateSnps = new File(outputDir + "/" + projectName + ".candidate.snp.vcf")
val candidateIndels = new File(outputDir + "/" + projectName + ".candidate.indel.vcf")

// SNP and INDEL Calls
add(snpCall(cohortList, candidateSnps))
add(indelCall(cohortList, candidateIndels))

val targets = new File(outputDir + "/" + projectName + ".targets")
add(target(candidateIndels, targets))

// Take regions based on indels called in previous step
val postCleaningBamList =
  for (bam <- cohortList) yield {
    val indelRealignedBam = swapExt(bam, ".bam", ".clean.bam")
    add(clean(Seq(bam), targets, indelRealignedBam))
    indelRealignedBam
  }

val afterCleanupSnps = swapExt(candidateSnps, ".candidate.snp.vcf", ".cleaned.snp.vcf")
val afterCleanupIndels = swapExt(candidateIndels, ".candidate.indel.vcf", ".cleaned.indel.vcf")

// Call snps/indels again
add(snpCall(postCleaningBamList, afterCleanupSnps))
add(indelCall(postCleaningBamList, afterCleanupIndels))

Where the cohortList is a Seq[File].

Right now I've solved this by setting this.scatterCount = 1 in the indelCall case class, however this doesn't feel quite satisfactory to me, so any pointers for a more robust solution would be greatly appreciated.

Comments (5)

Hello, I am writing my own walker and am currently using: rawContext.getPileup().getPileupForSample('sample_name') to generate pileups specific to a given sample. However, can I split the pileup by bam tag instead? Thank you.

Comments (3)

I have two licence questions. First of:

There are two licence notes in BaseTest.java. I'm assuming that the top one is the one that should be viewed as currently in use. Is this correct?

Secondly, under which licence are the test resources made available. Am I'm free to use and redistribute these? I'm asking since I'm in the process of refactoring my pipeline project to use GATK as a external dependency, but some of the tests I've written use the files provided with the GATK as a resource, and I'd like to keep them and distribute them with the project as I've done before in my GATK fork.

Comments (5)

I'm struggling somewhat with updating my own code to the 2.4 code base, and I would be very happy for any help you might offer on how to do this. First of all, I guess I should switch my upstream repository to https://github.com/broadgsa/gatk-protected (as I'm working in an academic not-for-profit environment).

What I would like to do is to rebase my own devel branch on the master branch of the gatk-protected repository. But trying to do so gives me a lot of conflicts in files that I have not altered in my own branch. Is resolving all of these conflicts one by one the only way to do this, or have I really got some problem with my understanding of how this is done? I'm guessing the later.

Either way it would be really nice to get some pointers on how to figure this one out.

Thanks, Johan

Comments (1)

So I started to play with the GATK API to annotate my VCF. :-)

My first simple aim is to translate mt VCFBigWig to the GATK . Here is my code so far (see below)

I compiled it with:

javac -cp /home/lindenb/package/GenomeAnalysisTK-2.4-3-g2a7af43/GenomeAnalysisTK.jar:/home/lindenb/package/bigwig-read-only/dist/BigWig.jar:/home/lindenb/package/bigwig-read-only/lib/log4j-1.2.15.jar -sourcepath src/main/java \
        ./src/main/java/com/github/lindenb/gatk/tools/vcfbigwig/VCFBigWig.java

(it is compiled but there is a bunch of warnings like "/home/lindenb/package/GenomeAnalysisTK-2.4-3-g2a7af43/GenomeAnalysisTK.jar(org/broadinstitute/sting/gatk/GenomeAnalysisEngine.class): warning: Cannot find annotation method 'value()' in type 'Ensures': class file for com.google.java.contract.Ensures not found"

And I tried to run my code using various invocations of java, like

java  -cp /home/lindenb/package/GenomeAnalysisTK-2.4-3-g2a7af43/GenomeAnalysisTK.jar:/home/lindenb/package/bigwig-read-only/dist/BigWig.jar:/home/lindenb/package/bigwig-read-only/lib/log4j-1.2.15.jar:. org.broadinstitute.sting.gatk.CommandLineGATK \
            -T VariantAnnotator \
            -A com.github.lindenb.gatk.tools.vcfbigwig.VCFBigWig \
            -R /test.fa test.vcf.gz

but it raises an exeception

java.lang.ExceptionInInitializerError
    at org.broadinstitute.sting.gatk.walkers.annotator.VariantAnnotatorEngine.initializeAnnotations(VariantAnnotatorEngine.java:116)
(...)
Caused by: java.lang.NullPointerException
    at org.broadinstitute.sting.utils.classloader.JVMUtils.isAnonymous(JVMUtils.java:91)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:158)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:107)
    at org.broadinstitute.sting.gatk.walkers.annotator.interfaces.AnnotationInterfaceManager.<clinit>(AnnotationInterfaceManager.java:34)
    ... 9 more

Questions:

  • What the proper to declare my class VCFBigWig , how can I make it available for the VariantAnnotatorEngine ?
  • I used the @Argument annotation, is it the right way to catch an argument from the command line?
  • Can I use the same Annotator for one GATK invocation but with different arguments ?
  • there is a "initialize" method; Is there a "dispose" method to release the resources ?
  • what is the interface "RodRequiringAnnotation" ?
  • is "UserException" the right way to send an error ?
  • Should an InfoFieldAnnotation or a Walker be thread free (like a HttpServlet) ? I mean can I safely use class members ?
  • I saw in VariantAnnotatorEngine that an annotator can either change the ID or the INFO. How should I design my class in order to alter both fields ? For example my tool VCFTabix with use another VCF file (e.g: the VCF for 1KGenomes) to update the ID and the INFO fields.

Thank you for your help,

Pierre

.

package com.github.lindenb.gatk.tools.vcfbigwig;

import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;

import org.broad.igv.bbfile.BBFileReader;
import org.broad.igv.bbfile.BigWigIterator;
import org.broad.igv.bbfile.WigItem;
import org.broadinstitute.sting.commandline.Argument;
import org.broadinstitute.sting.gatk.GenomeAnalysisEngine;
import org.broadinstitute.sting.gatk.contexts.AlignmentContext;
import org.broadinstitute.sting.gatk.contexts.ReferenceContext;
import org.broadinstitute.sting.gatk.refdata.RefMetaDataTracker;
import org.broadinstitute.sting.gatk.walkers.annotator.interfaces.AnnotatorCompatible;
import org.broadinstitute.sting.gatk.walkers.annotator.interfaces.InfoFieldAnnotation;
import org.broadinstitute.sting.gatk.walkers.annotator.interfaces.RodRequiringAnnotation;
import org.broadinstitute.sting.utils.exceptions.UserException;
import org.broadinstitute.sting.utils.genotyper.PerReadAlleleLikelihoodMap;
import org.broadinstitute.variant.variantcontext.VariantContext;
import org.broadinstitute.variant.vcf.VCFHeaderLine;
import org.broadinstitute.variant.vcf.VCFHeaderLineType;
import org.broadinstitute.variant.vcf.VCFInfoHeaderLine;


public class VCFBigWig
extends InfoFieldAnnotation 
implements RodRequiringAnnotation
{
/** PATH to the bigwig file */
@Argument(fullName="bigWigFile",shortName="bw",doc="BigWig to process",required=true)
private String bigWigFile=null;
/** the ID in the INFO field */
@Argument(fullName = "bigwigid", shortName = "bwid", doc="bigwig ID ", required=true)
private String bigWigId=null;
/** BBFileReader to the bigwig file */
private BBFileReader bbFileReader=null;

public VCFBigWig()
    {
    }

@Override
public void initialize(AnnotatorCompatible walker,
            GenomeAnalysisEngine toolkit, Set<VCFHeaderLine> headerLines) {
    super.initialize(walker, toolkit, headerLines);
    /* open the BIGFile, check it is a bigwig */
    try {
        this.bbFileReader=new BBFileReader(bigWigFile);
        }
    catch (IOException e) {
        throw new UserException("Cannot read "+bigWigFile,e);
        }
    if(!this.bbFileReader.isBigWigFile())
        {
        throw new UserException("The following file is not a BigWig File : "+bigWigFile);
        }
    }


@Override
public Map<String, Object> annotate(
        RefMetaDataTracker tracker,
        AnnotatorCompatible walker,
        ReferenceContext ref,
        Map<String, AlignmentContext> stratifiedContexts,
        VariantContext vc,
        Map<String, PerReadAlleleLikelihoodMap> stratifiedPerReadAlleleLikelihoodMap)
    {
    return annotate(tracker, walker,ref,stratifiedContexts,vc);
    }



@Override
public Map<String, Object> annotate(
        RefMetaDataTracker tracker,
        AnnotatorCompatible walker,
        ReferenceContext ref,
        Map<String, AlignmentContext> stratifiedContexts,
        VariantContext vc)
    {
    final boolean contained=true;
    double sum=0;
    int count=0;
    BigWigIterator iter=this.bbFileReader.getBigWigIterator(
            vc.getChr(), vc.getStart()-1,
            vc.getChr(), vc.getEnd(),
            contained
            );

    while(iter.hasNext())
        {
        WigItem item=iter.next();
        float v=item.getWigValue();
        sum+=v;
        count++;
        }

    if(count==0)
        {
        return Collections.emptyMap();
        }
    Map<String,Object> m=new HashMap<String, Object>(1);
    m.put(this.bigWigId, (float)(sum/count));
    return m ;
    }
@Override
public List<VCFInfoHeaderLine> getDescriptions()
    {
    return Arrays.asList(new VCFInfoHeaderLine[]{
        new VCFInfoHeaderLine(bigWigId,1, VCFHeaderLineType.Float,"Mean values for bigwig file "+this.bigWigFile)   
        });
    }
@Override
public List<String> getKeyNames() {
    return Arrays.asList(new String[]{
            bigWigId    
            });
    }

}
Comments (3)

I just quickly wrote a set of Tools to annotate my VCFs ( http://plindenbaum.blogspot.fr/2013/02/4-tools-i-wrote-today-to-annotate-vcf.html )

For example, one of those tools uses a BED/XML file indexed with tabix to annotate my VCF . (My code just uses the java api for tabix to get the XML at a given position)

Question: is there something in the GATK-API that would allow me to implement my code using the GATK-API: What kind of walker should I use ? What would be the benefits of using the GATK-API ? for example does using a gatk-walker will automatically make my code parallelizable ?

Pierre

Comments (4)

Hi All, In my desparate attempts to learn more about how the GATK works, but since it's written in JAVA, which I have almost no clue about (python all the way), so to be honest I've not the faintest idea how to go about troubleshooting the below error.

I found a blog post by the ever wonderful Pierre Lindebaum here that went through compiling and running you're first GATK walker.

First using Pierre's example (after some compiling issues as there doesn't seem to be a ReadMetaDataTracker class anymore and I replace with RefMetaDataTracker) I get the following error:

java -cp /path/to/GenomeAnalysisTK.jar:/path/to/cofoja-1.0-r139.jar:/path/to/HelloRead.jar org.broadinstitute.sting.gatk.CommandLineGATK -T HelloRead -I path/to/test.bam -R /path/to/human_g1k_v37.fasta
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.NullPointerException
    at org.broadinstitute.sting.utils.classloader.JVMUtils.isAnonymous(JVMUtils.java:91)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:155)
    at org.broadinstitute.sting.utils.classloader.PluginManager.<init>(PluginManager.java:124)
    at org.broadinstitute.sting.gatk.WalkerManager.<init>(WalkerManager.java:55)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.<init>(GenomeAnalysisEngine.java:160)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.<init>(CommandLineExecutable.java:53)
    at org.broadinstitute.sting.gatk.CommandLineGATK.<init>(CommandLineGATK.java:57)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.2-16-g9f648cb):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

I thought that was a very strange error and in Pierre's example he used GATK version 1.4 where as I am on 2.2.

I also tried the following

java -cp /path/to/cofoja-1.0-r139.jar:/path/to/HelloRead.jar -jar /path/to/GenomeAnalysisTK.jar -T HelloRead -I path/to/test.bam -R /path/to/human_g1k_v37.fasta
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-16-g9f648cb): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: HelloRead
##### ERROR ------------------------------------------------------------------------------------------

I have not tried downloading on older version of the GATK and running it with that. I'm pretty much stuck here. Any help is greatly appreciated.

Cheers, Davy

Comments (1)

I've been looking all over the website, but I can't find the location of the source code to download it. I am interested in understanding GATK more in depth and possibly developing walkers for it. Thanks.

Comments (2)

I'm having issues with the GATKExtensionsGenerator. When I'm trying to build the gatk with my own extentions I get the following error error message:

queue-extensions.generate:
    [mkdir] Created dir: /home/MOLMED/dahljo/workspace/gatk/build/queue-extensions/src
     [echo] Generating Queue GATK extensions...
     [java] Exception in thread "main" java.lang.ExceptionInInitializerError
     [java]     at org.broadinstitute.sting.queue.extensions.gatk.GATKExtensionsGenerator.<init>(GATKExtensionsGenerator.java:92)
     [java]     at org.broadinstitute.sting.queue.extensions.gatk.GATKExtensionsGenerator.main(GATKExtensionsGenerator.java:103)
     [java] Caused by: java.lang.NullPointerException
     [java]     at org.reflections.Reflections.scan(Reflections.java:220)
     [java]     at org.reflections.Reflections.scan(Reflections.java:166)
     [java]     at org.reflections.Reflections.<init>(Reflections.java:94)
     [java]     at org.broadinstitute.sting.utils.classloader.PluginManager.<clinit>(PluginManager.java:77)
     [java]     ... 2 more

I guess that this is because the GATKExtensionsGenerator tries to generate a extension for the extension that I've written. However, as far as I understand the following lines in the GATKExtensionsGenerator: PluginManager<CommandLineProgram> clpManager = new PluginManager<CommandLineProgram>(CommandLineProgram.class, "CommandLineProgram", "CLP"); This should only be done for classes extending the CommandLineProgram class, which my program does not.

My program is a simple stand alone Scala program for splitting files (e.g. fastq and bams), and I've written a extension class for it which looks like this:

package se.uu.medsci.queue.ext

import java.io.File
import org.broadinstitute.sting.queue.function.JavaCommandLineFunction
import org.broadinstitute.sting.commandline._

class Splitter extends JavaCommandLineFunction {

    analysisName = "Splitter"
    javaMainClass = "se.uu.medsci.splitter.Splitter"

    @Input(doc = "The input file to be split", shortName = "input", fullName = "input_bam_files", required = true)
    var input: File = _

    @Output(doc = "Directory to output the split files to-", shortName = "o", fullName = "output_dir", required = true)
    var output: File = _

    @Argument(doc = "", shortName = "ds", fullName = "read_group_description", required = false)
    var recordsPerOutputFile: Int = -1

    override def freezeFieldValues() {
        super.freezeFieldValues()
        if (recordsPerOutputFile == -1)
            recordsPerOutputFile = 100000
    }

    override def commandLine = super.commandLine +
        required("-i", input) +
        required("-d", output) +
        optional("-r", recordsPerOutputFile)
}

As I think that this should have nothing to do with the GATKExtensionsGenerator I'm a bit confused about what might cause this error. Any ideas?