Tagged with #github
1 documentation article | 0 announcements | 5 forum discussions

Created 2014-04-04 18:49:28 | Updated 2015-12-19 12:44:49 | Tags: github git source-code

Comments (0)

We distinguish "Classic GATK" (major versions 1 through 3) and GATK 4, the next generation of GATK tools.

"Classic GATK" (major versions 1 through 3) (current distribution)

We provide the current GATK source code through two publicly accessible Github repositories: broadgsa/gatk and broadgsa/gatk-protected.

1. broadgsa/gatk

This repository contains the code corresponding to the core GATK development framework, including the GATK engine and many utilities, which third-party developers can use to develop their own GATK-based analysis tools. Be advised however that support for development using this framework is being discontinued.

All the code in this repository is open-source under the MIT license. The full text of the license can be viewed here.

2. broadgsa/gatk-protected

This repository contains the code corresponding to the GenomeAnalysisTK.jar file that we distribute to our users, containing the GATK engine and all analysis tools.

This includes the code in broadgsa/gatk under the MIT license, plus tools and utilities that are under a more restrictive license that prohibits commercial/for-profit use. Anyone interested in accessing the protected code for commercial/for-profit purposes should contact our licensing department (softwarelicensing@broadinstitute.org) to inquire about licensing terms.


The code for GATK 4+, currently available as an alpha preview, is accessible through two publicly accessible Github repositories: broadinstitute/gatk and broadinstitute/gatk-protected. The division is also based on having two different licenses, like Classic GATK, but in this case the repositories are complementary; there is no code shared between them.

1. broadinstitute/gatk

This repository contains the code corresponding to the core GATK 4+ development framework, including the new GATK engine and many utilities, which third-party developers can use to develop their own GATK-based analysis tools. We encourage developers to use this new framework for development and we welcome feedback regarding features and development support.

All the code in this repository is open-source under a BSD license. The full text of the license can be viewed here.

2. broadinstitute/gatk-protected

This repository contains the code for key analysis tools that are covered under a more restrictive license that prohibits commercial/for-profit use. Anyone interested in accessing the protected code for commercial/for-profit purposes should contact our licensing department (softwarelicensing@broadinstitute.org) to inquire about licensing terms.

No articles to display.

Created 2015-03-13 09:17:33 | Updated | Tags: github source code compile

Comments (4)

I'm trying to compile GATK from the github repo, using a new maven folder:

curl -kLo tmp.zip "https://github.com/broadgsa/gatk/archive/3.3.zip"
unzip tmp.zip
cd gatk-3.3/
/commun/data/packages/apache-maven-3.1.1/bin/mvn compile -Dmaven.repo.local=/commun/data/users/lindenb/tmp/GATK/m2

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] GATK Root ......................................... SUCCESS [5.755s]
[INFO] GATK Aggregator ................................... SUCCESS [0.313s]
[INFO] GATK GSALib ....................................... SUCCESS [7.951s]
[INFO] GATK Utils ........................................ SUCCESS [16.481s]
[INFO] GATK Engine ....................................... SUCCESS [0.528s]
[INFO] GATK Tools Public ................................. SUCCESS [8.714s]
[INFO] External Example .................................. SUCCESS [2.401s]
[INFO] GATK Queue ........................................ SUCCESS [46.933s]
[INFO] GATK Queue Extensions Generator ................... SUCCESS [0.260s]
[INFO] GATK Queue Extensions Public ...................... FAILURE [0.930s]
[INFO] GATK Aggregator Public ............................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:31.050s
[INFO] Finished at: Fri Mar 13 10:06:42 CET 2015
[INFO] Final Memory: 49M/591M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project gatk-queue-extensions-public: Could not resolve dependencies for project org.broadinstitute.gatk:gatk-queue-extensions-public:jar:3.3: The following artifacts could not be resolved: org.broadinstitute.gatk:gatk-tools-public:jar:tests:3.3, org.broadinstitute.gatk:gatk-queue:jar:tests:3.3: Could not find artifact org.broadinstitute.gatk:gatk-tools-public:jar:tests:3.3 in gatk.public.repo.local (file:/commun/data/users/lindenb/tmp/GATK/gatk-3.3/public/gatk-queue-extensions-public/../../public/repo) -> [Help 1]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :gatk-queue-extensions-public

I don't really understand the error message. I understand that this GATK version only contains the public features but , am I wrong, I expect to find a jar 'GenomeAnalysisTK*'

How can I fix this ?


Created 2014-07-15 11:10:25 | Updated | Tags: github maven gatk-protected

Comments (7)

I tried just now to checkout and build gatk-protected from the github repo. However, I ran afoul of the following problem (see below for error message). I'm guessing that it's originating from some unresolved dependency in the pom file. However, my knowledge of maven is somewhat sketchy.

I check it out from github git clone https://github.com/broadgsa/gatk-protected.git and build it with the mvn package.

dahljo@hydra:~/workspace/gatk-protected$ mvn package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] GATK Aggregator
[INFO] GATK Engine
[INFO] GATK Tools Public
[INFO] External Example
[INFO] GATK Queue Extensions Generator
[INFO] GATK Queue Extensions Public
[INFO] GATK Aggregator Public
[INFO] GATK Tools Protected
[INFO] GATK Package Distribution
[INFO] GATK Queue Extensions Distribution
[INFO] GATK Queue Package Distribution
[INFO] GATK Aggregator Protected
[INFO] ------------------------------------------------------------------------
[INFO] Building GATK Root 3.2-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] --- gitdescribe-maven-plugin:2.0:gitdescribe (gitdescribe-initialize) @ gatk-root ---
[INFO] [git, describe, --long]
[INFO] Setting Git Describe: git-3.2-0-g799071b
[INFO] --- build-helper-maven-plugin:1.8:regex-property (fix-version-initialize) @ gatk-root ---
[INFO] ------------------------------------------------------------------------
[INFO] Building GATK Aggregator 3.2-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] --- gitdescribe-maven-plugin:2.0:gitdescribe (gitdescribe-initialize) @ gatk-aggregator ---
[INFO] --- build-helper-maven-plugin:1.8:regex-property (fix-version-initialize) @ gatk-aggregator ---
[INFO] --- exec-maven-plugin:1.2.1:exec (delete-mavens-links) @ gatk-aggregator ---
rm: missing operand
Try 'rm --help' for more information.
rm: missing operand
Try 'rm --help' for more information.
[INFO] --- maven-junction-plugin:1.0.3:link (link-public-testdata) @ gatk-aggregator ---
[INFO] --- maven-junction-plugin:1.0.3:link (link-public-qscript) @ gatk-aggregator ---
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] --- gitdescribe-maven-plugin:2.0:gitdescribe (gitdescribe-initialize) @ gsalib ---
[INFO] --- build-helper-maven-plugin:1.8:regex-property (fix-version-initialize) @ gsalib ---
[INFO] --- maven-assembly-plugin:2.4:single (gsalib-assembly) @ gsalib ---
[INFO] Reading assembly descriptor: src/assembly/gsalib.xml
[INFO] Building tar: /home/dahljo/workspace/gatk-protected/public/gsalib/target/gsalib-3.2-SNAPSHOT.tar.gz
[INFO] --- maven-junction-plugin:1.0.3:link (link-public-testdata) @ gsalib ---
[INFO] --- maven-junction-plugin:1.0.3:link (link-public-qscript) @ gsalib ---
[INFO] ------------------------------------------------------------------------
[INFO] Building GATK Utils 3.2-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: file:/home/dahljo/workspace/gatk-protected/public/gatk-utils/../../public/repo/com/google/code/cofoja/cofoja/1.0-r139/cofoja-1.0-r139.pom
Exception in thread "pool-1-thread-1" ---------------------------------------------------
constituent[0]: file:/usr/share/maven/lib/sisu-inject-bean.jar
java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtilsconstituent[1]: file:/usr/share/maven/lib/commons-cli.jar

constituent[2]: file:/usr/share/maven/lib/maven-compat-3.x.jar
    at org.apache.maven.wagon.providers.file.FileWagon.resolveDestinationPath(FileWagon.java:206)
constituent[3]: file:/usr/share/maven/lib/aether-util.jar
    at org.apache.maven.wagon.providers.file.FileWagon.resourceExists(FileWagon.java:265)constituent[4]: file:/usr/share/maven/lib/maven-artifact-3.x.jar

    at org.sonatype.aether.connector.wagon.WagonRepositoryConnector$GetTask.run(WagonRepositoryConnector.java:577)constituent[5]: file:/usr/share/maven/lib/wagon-http-shaded.jar

    at org.sonatype.aether.util.concurrency.RunnableErrorForwarder$1.run(RunnableErrorForwarder.java:60)constituent[6]: file:/usr/share/maven/lib/maven-repository-metadata-3.x.jar

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)constituent[7]: file:/usr/share/maven/lib/commons-httpclient.jar

constituent[8]: file:/usr/share/maven/lib/plexus-utils.jar
constituent[9]: file:/usr/share/maven/lib/aether-connector-wagon.jarCaused by: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils

constituent[10]: file:/usr/share/maven/lib/plexus-interpolation.jar at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:50)

constituent[11]: file:/usr/share/maven/lib/aether-impl.jar
    at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:259)
constituent[12]: file:/usr/share/maven/lib/maven-settings-3.x.jar
    at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:235)constituent[13]: file:/usr/share/maven/lib/sisu-guice.jar

    at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:227)constituent[14]: file:/usr/share/maven/lib/guava.jar

    ... 7 moreconstituent[15]: file:/usr/share/maven/lib/commons-codec.jar

constituent[16]: file:/usr/share/maven/lib/wagon-provider-api.jar
constituent[17]: file:/usr/share/maven/lib/plexus-component-annotations.jar
constituent[18]: file:/usr/share/maven/lib/maven-plugin-api-3.x.jar
constituent[19]: file:/usr/share/maven/lib/maven-settings-builder-3.x.jar
constituent[20]: file:/usr/share/maven/lib/maven-model-builder-3.x.jar
constituent[21]: file:/usr/share/maven/lib/plexus-cipher.jar
constituent[22]: file:/usr/share/maven/lib/maven-model-3.x.jar
constituent[23]: file:/usr/share/maven/lib/commons-logging.jar
constituent[24]: file:/usr/share/maven/lib/sisu-inject-plexus.jar
constituent[25]: file:/usr/share/maven/lib/maven-core-3.x.jar
constituent[26]: file:/usr/share/maven/lib/aether-api.jar
constituent[27]: file:/usr/share/maven/lib/maven-aether-provider-3.x.jar
constituent[28]: file:/usr/share/maven/lib/maven-embedder-3.x.jar
constituent[29]: file:/usr/share/maven/lib/aether-spi.jar
constituent[30]: file:/usr/share/maven/lib/plexus-sec-dispatcher.jar
constituent[31]: file:/usr/share/maven/lib/wagon-file.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils
    at org.apache.maven.wagon.providers.file.FileWagon.resolveDestinationPath(FileWagon.java:206)
    at org.apache.maven.wagon.providers.file.FileWagon.resourceExists(FileWagon.java:265)
    at org.sonatype.aether.connector.wagon.WagonRepositoryConnector$GetTask.run(WagonRepositoryConnector.java:577)
    at org.sonatype.aether.util.concurrency.RunnableErrorForwarder$1.run(RunnableErrorForwarder.java:60)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils
    at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:50)
    at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:259)
    at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:235)
    at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:227)
    ... 7 more

This is my environment info on maven:

dahljo@hydra:~/workspace/gatk-protected$ mvn -version
Apache Maven 3.0.5
Maven home: /usr/share/maven
Java version: 1.7.0_45, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-oracle/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.13.0-30-generic", arch: "amd64", family: "unix"

Has anybody else seen this?

Created 2014-02-14 10:39:55 | Updated | Tags: install mutect github git bcel

Comments (0)


I was trying to install the github version of mutect and I have some questions as well as a hope that people who had similar problems might get help from my endeavours.

I followed the instructions posted on the github page, however when I tried to build:

# build ant -Dexternal.dir='pwd'/../mutect-src -Dexecutable=mutect package

It told me I didnt have the correct bcel files in my ~/.ant/lib/:

The bcel jar can be found in the lib directory of a GATK clone after compiling, and the ant-apache-bcel jar can be downloaded from here: http://repo1.maven.org/maven2/ant/ant-apache-bcel/1.6.5/ant-apache-bcel-1.6.5.jar Please copy these two jar files to ~/.ant/lib/

I had already downloaded the ant-apache-bcel and put it there so I figured it must be the GATK clone lib. I compiled with ant dist clean but it failed and the created "lib" folder was empty. However it did create a "dist" folder and in there i found bcel-5.2.jar. I popped this in ~/.ant/lib/ and now mutect seems to build correctly using:

# build ant -Dexternal.dir='pwd'/../mutect-src -Dexecutable=mutect package

So to my questions.

  1. Is this an OK way to build it? (Can I trust the program despite unorthodox installation procedure).

  2. Howcome the mutect install instructions dont specifically mention where to find the apache bcel library (I would not have found it without the error message) and guides you to compile the gatk-protected to get the second jar file that you need? Also where to put them!?

Created 2013-09-10 18:46:33 | Updated | Tags: release github

Comments (2)

Hi Folks,

I'd like to perform a git checkout of the source used to build the following release:


At one point in time we made patches to that release and I need to re-apply them with some modifications. I have a checkout of the 1.6 tag from github, but I'd like to be sure I'm patching the exact same code used to build the 1.6-9-g47df7bb version above.

But why, you ask ...

Understanding that this release is ancient and yes we should really be using more up to date code, unfortunately a few hundred terabytes of data has already been analyzed with this version and the statisticians will require that analysis be repeated if we make the switch mid-project. I need to patch this version because we're running into an NFS file locking bug that's causing jobs to hang for which I think I've got a solution so long as I can get the source.

Thanks, Matt

Created 2012-10-15 15:24:04 | Updated 2012-10-15 15:24:04 | Tags: vcf structured github git source code

Comments (2)

Hi the GATK team,

I hate the VCF format :-)

I want a structured output and I'd like to promote the use of the XML/JSON to store the variations. I think the best way to achieve this, is to integrate this new format in the GATK rather than creating another tool converting the VCF to XML/JSON. In the best world, I can insert the result of, say the ENSEMBL API ( e.g. http://beta.rest.ensembl.org/vep/human/9:22125503-22125502:1/C/consequences?content-type=text/xml ) in each 'variation' element.

I've forked the GATK and created a new class to handle the XML output:


now in VariantContextWriterFactory, when the filename ends with ".xml", the factory creates a new XMLVariantContextWriter rather than a VCFWriter .

I'm currently writing XMLVariantContextWriter and I've only written the header and the chrom/pos for the variations. Here is a sample:

java -jar dist/GenomeAnalysisTK.jar  -T UnifiedGenotyper -o /home/lindenb/package/samtools-0.1.18/examples/ex1f.vcf.xml -R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa -I /home/lindenb/package/samtools-0.1.18/examples/sorted.bam
INFO  17:12:28,358 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,361 HelpFormatter - The Genome Analysis Toolkit (GATK) vdbffd2fa3e7a043a6951d8ac58dd619e68a6caa8, Compiled 2012/10/15 16:53:32 
INFO  17:12:28,361 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  17:12:28,361 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  17:12:28,362 HelpFormatter - Program Args: -T UnifiedGenotyper -o /home/lindenb/package/samtools-0.1.18/examples/ex1f.vcf.xml -R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa -I /home/lindenb/package/samtools-0.1.18/examples/sorted.bam 
INFO  17:12:28,363 HelpFormatter - Date/Time: 2012/10/15 17:12:28 
INFO  17:12:28,364 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,364 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,392 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:12:28,430 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  17:12:28,444 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  17:12:28,835 TraversalEngine -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  17:12:30,721 TraversalEngine - Total runtime 2.00 secs, 0.03 min, 0.00 hours 
INFO  17:12:30,723 TraversalEngine - 108 reads were filtered out during traversal out of 9921 total (1.09%) 
INFO  17:12:30,727 TraversalEngine -   -> 108 reads (1.09% of total) failing UnmappedReadFilter 


<?xml version="1.0"?>
<vcf xmlns="http://xml.1000genomes.org/">
    <metadata key="fileformat">VCFv4.1</metadata>
      <info ID="FS" type="Float" count="1">Phred-scaled p-value using Fisher's exact test to detect strand bias</info>
      <info ID="AN" type="Integer" count="1">Total number of alleles in called genotypes</info>
      <info ID="BaseQRankSum" type="Float" count="1">Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities</info>
      <info ID="MQ" type="Float" count="1">RMS Mapping Quality</info>
      <info ID="AF" type="Float">Allele Frequency, for each ALT allele, in the same order as listed</info>
      <format ID="DP" type="Integer" count="1">Approximate read depth (reads with MQ=255 or with bad mates are filtered)</format>
      <format ID="GT" type="String" count="1">Genotype</format>
      <format ID="PL" type="Integer">Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification</format>
      <format ID="GQ" type="Integer" count="1">Genotype Quality</format>
      <format ID="AD" type="Integer">Allelic depths for the ref and alt alleles in the order listed</format>
      <filter ID="LowQual"/>
      <contig ID="seq1" index="0"/>
      <contig ID="seq2" index="1"/>
      <sample id="1">ex1</sample>
      <sample id="2">ex1b</sample>
      <variation chrom="seq1" pos="285">
      <variation chrom="seq1" pos="287">

would you accept a pull request for that project ?

(I'd like to create a JSON ouput too)