Tagged with #git
1 documentation article | 0 announcements | 2 forum discussions



Created 2014-04-04 18:49:28 | Updated 2015-12-19 12:44:49 | Tags: github git source-code

Comments (0)

We distinguish "Classic GATK" (major versions 1 through 3) and GATK 4, the next generation of GATK tools.


"Classic GATK" (major versions 1 through 3) (current distribution)

We provide the current GATK source code through two publicly accessible Github repositories: broadgsa/gatk and broadgsa/gatk-protected.

1. broadgsa/gatk

This repository contains the code corresponding to the core GATK development framework, including the GATK engine and many utilities, which third-party developers can use to develop their own GATK-based analysis tools. Be advised however that support for development using this framework is being discontinued.

All the code in this repository is open-source under the MIT license. The full text of the license can be viewed here.

2. broadgsa/gatk-protected

This repository contains the code corresponding to the GenomeAnalysisTK.jar file that we distribute to our users, containing the GATK engine and all analysis tools.

This includes the code in broadgsa/gatk under the MIT license, plus tools and utilities that are under a more restrictive license that prohibits commercial/for-profit use. Anyone interested in accessing the protected code for commercial/for-profit purposes should contact our licensing department (softwarelicensing@broadinstitute.org) to inquire about licensing terms.


GATK 4+

The code for GATK 4+, currently available as an alpha preview, is accessible through two publicly accessible Github repositories: broadinstitute/gatk and broadinstitute/gatk-protected. The division is also based on having two different licenses, like Classic GATK, but in this case the repositories are complementary; there is no code shared between them.

1. broadinstitute/gatk

This repository contains the code corresponding to the core GATK 4+ development framework, including the new GATK engine and many utilities, which third-party developers can use to develop their own GATK-based analysis tools. We encourage developers to use this new framework for development and we welcome feedback regarding features and development support.

All the code in this repository is open-source under a BSD license. The full text of the license can be viewed here.

2. broadinstitute/gatk-protected

This repository contains the code for key analysis tools that are covered under a more restrictive license that prohibits commercial/for-profit use. Anyone interested in accessing the protected code for commercial/for-profit purposes should contact our licensing department (softwarelicensing@broadinstitute.org) to inquire about licensing terms.

No articles to display.


Created 2014-02-14 10:39:55 | Updated | Tags: install mutect github git bcel

Comments (0)

Hi

I was trying to install the github version of mutect and I have some questions as well as a hope that people who had similar problems might get help from my endeavours.

I followed the instructions posted on the github page, however when I tried to build:

# build ant -Dexternal.dir='pwd'/../mutect-src -Dexecutable=mutect package

It told me I didnt have the correct bcel files in my ~/.ant/lib/:

The bcel jar can be found in the lib directory of a GATK clone after compiling, and the ant-apache-bcel jar can be downloaded from here: http://repo1.maven.org/maven2/ant/ant-apache-bcel/1.6.5/ant-apache-bcel-1.6.5.jar Please copy these two jar files to ~/.ant/lib/

I had already downloaded the ant-apache-bcel and put it there so I figured it must be the GATK clone lib. I compiled with ant dist clean but it failed and the created "lib" folder was empty. However it did create a "dist" folder and in there i found bcel-5.2.jar. I popped this in ~/.ant/lib/ and now mutect seems to build correctly using:

# build ant -Dexternal.dir='pwd'/../mutect-src -Dexecutable=mutect package

So to my questions.

  1. Is this an OK way to build it? (Can I trust the program despite unorthodox installation procedure).

  2. Howcome the mutect install instructions dont specifically mention where to find the apache bcel library (I would not have found it without the error message) and guides you to compile the gatk-protected to get the second jar file that you need? Also where to put them!?

Created 2012-10-15 15:24:04 | Updated 2012-10-15 15:24:04 | Tags: vcf structured github git source code

Comments (2)

Hi the GATK team,

I hate the VCF format :-)

I want a structured output and I'd like to promote the use of the XML/JSON to store the variations. I think the best way to achieve this, is to integrate this new format in the GATK rather than creating another tool converting the VCF to XML/JSON. In the best world, I can insert the result of, say the ENSEMBL API ( e.g. http://beta.rest.ensembl.org/vep/human/9:22125503-22125502:1/C/consequences?content-type=text/xml ) in each 'variation' element.

I've forked the GATK and created a new class to handle the XML output:

https://github.com/lindenb/gatk/commit/dbffd2fa3e7a043a6951d8ac58dd619e68a6caa8

now in VariantContextWriterFactory, when the filename ends with ".xml", the factory creates a new XMLVariantContextWriter rather than a VCFWriter .

I'm currently writing XMLVariantContextWriter and I've only written the header and the chrom/pos for the variations. Here is a sample:

java -jar dist/GenomeAnalysisTK.jar  -T UnifiedGenotyper -o /home/lindenb/package/samtools-0.1.18/examples/ex1f.vcf.xml -R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa -I /home/lindenb/package/samtools-0.1.18/examples/sorted.bam
INFO  17:12:28,358 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,361 HelpFormatter - The Genome Analysis Toolkit (GATK) vdbffd2fa3e7a043a6951d8ac58dd619e68a6caa8, Compiled 2012/10/15 16:53:32 
INFO  17:12:28,361 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  17:12:28,361 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  17:12:28,362 HelpFormatter - Program Args: -T UnifiedGenotyper -o /home/lindenb/package/samtools-0.1.18/examples/ex1f.vcf.xml -R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa -I /home/lindenb/package/samtools-0.1.18/examples/sorted.bam 
INFO  17:12:28,363 HelpFormatter - Date/Time: 2012/10/15 17:12:28 
INFO  17:12:28,364 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,364 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,392 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:12:28,430 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  17:12:28,444 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  17:12:28,835 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] 
INFO  17:12:28,835 TraversalEngine -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  17:12:30,721 TraversalEngine - Total runtime 2.00 secs, 0.03 min, 0.00 hours 
INFO  17:12:30,723 TraversalEngine - 108 reads were filtered out during traversal out of 9921 total (1.09%) 
INFO  17:12:30,727 TraversalEngine -   -> 108 reads (1.09% of total) failing UnmappedReadFilter 

output:

<?xml version="1.0"?>
<vcf xmlns="http://xml.1000genomes.org/">
  <head>
    <metadata key="fileformat">VCFv4.1</metadata>
    <info-list>
      <info ID="FS" type="Float" count="1">Phred-scaled p-value using Fisher's exact test to detect strand bias</info>
      <info ID="AN" type="Integer" count="1">Total number of alleles in called genotypes</info>
      <info ID="BaseQRankSum" type="Float" count="1">Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities</info>
      <info ID="MQ" type="Float" count="1">RMS Mapping Quality</info>
      (....)
      <info ID="AF" type="Float">Allele Frequency, for each ALT allele, in the same order as listed</info>
    </info-list>
    <format-list>
      <format ID="DP" type="Integer" count="1">Approximate read depth (reads with MQ=255 or with bad mates are filtered)</format>
      <format ID="GT" type="String" count="1">Genotype</format>
      <format ID="PL" type="Integer">Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification</format>
      <format ID="GQ" type="Integer" count="1">Genotype Quality</format>
      <format ID="AD" type="Integer">Allelic depths for the ref and alt alleles in the order listed</format>
    </format-list>
    <filters-list>
      <filter ID="LowQual"/>
    </filters-list>
    <contigs-list>
      <contig ID="seq1" index="0"/>
      <contig ID="seq2" index="1"/>
    </contigs-list>
    <samples-list>
      <sample id="1">ex1</sample>
      <sample id="2">ex1b</sample>
    </samples-list>
  </head>
  <body>
    <variations>
      <variation chrom="seq1" pos="285">
        <id>.</id>
        <ref>T</ref>
        <alt>A</alt>
      </variation>
      <variation chrom="seq1" pos="287">
        <id>.</id>
        <ref>C</ref>
        <alt>A</alt>
      </variation>
     (....)
  </body>
</vcf>

would you accept a pull request for that project ?

(I'd like to create a JSON ouput too)

Pierre