Tagged with #git
1 documentation article | 0 announcements | 1 forum discussion

Created 2014-04-04 18:49:28 | Updated 2015-09-09 20:27:30 | Tags: github git open-source source-code
Comments (0)

We provide the GATK source code through two publicly accessible Github repositories: broadgsa/gatk and broadgsa/gatk-protected. This document explains what are the differences between the two and, importantly, what software licenses cover their contents.

1. broadgsa/gatk

This repository contains the code corresponding to the core GATK development framework, including the GATK engine and many utilities, which third-party developers can use to develop their own GATK-based analysis tools. All the code in this repository is open-source under the MIT license.

2. broadgsa/gatk-protected

This repository contains the code corresponding to the GenomeAnalysisTK.jar file that we distribute to our users, containing the GATK engine and all analysis tools. This includes the code in broadgsa/gatk under the MIT license, plus tools and utilities that are under a more restrictive license which prohibits for-profit use. Anyone interested in accessing the protected code for for-profit purposes should contact our licensing department (softwarelicensing@broadinstitute.org) to inquire about licensing terms.

No posts found with the requested search criteria.

Created 2012-10-15 15:24:04 | Updated 2012-10-15 15:24:04 | Tags: vcf xml structured github git source code
Comments (2)

Hi the GATK team,

I hate the VCF format :-)

I want a structured output and I'd like to promote the use of the XML/JSON to store the variations. I think the best way to achieve this, is to integrate this new format in the GATK rather than creating another tool converting the VCF to XML/JSON. In the best world, I can insert the result of, say the ENSEMBL API ( e.g. http://beta.rest.ensembl.org/vep/human/9:22125503-22125502:1/C/consequences?content-type=text/xml ) in each 'variation' element.

I've forked the GATK and created a new class to handle the XML output:


now in VariantContextWriterFactory, when the filename ends with ".xml", the factory creates a new XMLVariantContextWriter rather than a VCFWriter .

I'm currently writing XMLVariantContextWriter and I've only written the header and the chrom/pos for the variations. Here is a sample:

java -jar dist/GenomeAnalysisTK.jar  -T UnifiedGenotyper -o /home/lindenb/package/samtools-0.1.18/examples/ex1f.vcf.xml -R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa -I /home/lindenb/package/samtools-0.1.18/examples/sorted.bam
INFO  17:12:28,358 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,361 HelpFormatter - The Genome Analysis Toolkit (GATK) vdbffd2fa3e7a043a6951d8ac58dd619e68a6caa8, Compiled 2012/10/15 16:53:32 
INFO  17:12:28,361 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  17:12:28,361 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  17:12:28,362 HelpFormatter - Program Args: -T UnifiedGenotyper -o /home/lindenb/package/samtools-0.1.18/examples/ex1f.vcf.xml -R /home/lindenb/package/samtools-0.1.18/examples/ex1.fa -I /home/lindenb/package/samtools-0.1.18/examples/sorted.bam 
INFO  17:12:28,363 HelpFormatter - Date/Time: 2012/10/15 17:12:28 
INFO  17:12:28,364 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,364 HelpFormatter - ---------------------------------------------------------------------------------------------------------- 
INFO  17:12:28,392 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:12:28,430 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  17:12:28,444 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 
INFO  17:12:28,835 TraversalEngine -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  17:12:30,721 TraversalEngine - Total runtime 2.00 secs, 0.03 min, 0.00 hours 
INFO  17:12:30,723 TraversalEngine - 108 reads were filtered out during traversal out of 9921 total (1.09%) 
INFO  17:12:30,727 TraversalEngine -   -> 108 reads (1.09% of total) failing UnmappedReadFilter 


<?xml version="1.0"?>
<vcf xmlns="http://xml.1000genomes.org/">
    <metadata key="fileformat">VCFv4.1</metadata>
      <info ID="FS" type="Float" count="1">Phred-scaled p-value using Fisher's exact test to detect strand bias</info>
      <info ID="AN" type="Integer" count="1">Total number of alleles in called genotypes</info>
      <info ID="BaseQRankSum" type="Float" count="1">Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities</info>
      <info ID="MQ" type="Float" count="1">RMS Mapping Quality</info>
      <info ID="AF" type="Float">Allele Frequency, for each ALT allele, in the same order as listed</info>
      <format ID="DP" type="Integer" count="1">Approximate read depth (reads with MQ=255 or with bad mates are filtered)</format>
      <format ID="GT" type="String" count="1">Genotype</format>
      <format ID="PL" type="Integer">Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification</format>
      <format ID="GQ" type="Integer" count="1">Genotype Quality</format>
      <format ID="AD" type="Integer">Allelic depths for the ref and alt alleles in the order listed</format>
      <filter ID="LowQual"/>
      <contig ID="seq1" index="0"/>
      <contig ID="seq2" index="1"/>
      <sample id="1">ex1</sample>
      <sample id="2">ex1b</sample>
      <variation chrom="seq1" pos="285">
      <variation chrom="seq1" pos="287">

would you accept a pull request for that project ?

(I'd like to create a JSON ouput too)