GATK 3.5 was released on November 25, 2015. Itemized changes are listed below. For more details, see the user-friendly version highlights.
Added allele-specific version of existing annotations: AS_BaseQualityRankSumTest, AS_FisherStrand, AS_MappingQualityRankSumTest, AS_RMSMappingQuality, AS_RankSumTest, AS_ReadPosRankSumTest, AS_StrandOddsRatio, AS_QualByDepth and AS_InbreedingCoeff.
Added BaseCountsBySample annotation. Intended to provide insight into the pileup of bases used by HaplotypeCaller in the calling process, which may differ from the pileup observed in the original bam file because of the local realignment and additional filtering performed internally by HaplotypeCaller. Can only be requested from HaplotypeCaller, not VariantAnnotator.
Added ExcessHet annotation. Estimates excess heterozygosity in a population of samples. Related to but distinct from InbreedingCoeff, which estimates evidence for inbreeding in a population. ExcessHet scales more reliably to large cohort sizes.
Added FractionInformativeReads annotation. Reports the number of reads that were considered informative by HaplotypeCaller (over all samples).
Enforced calculating GenotypeAnnotations before InfoFieldAnnotations. This ensures that the AD value is available to use in the QD calculation.
Reorganized standard annotation groups processing to ensure that all default annotations always get annotated regardless of what is specified on the command line. This fixes a bug where default annotations were getting dropped when the command line included annotation requests.
Made GenotypeGVCFs subset StrandAlleleCounts intelligently, i.e. subset the SAC values to the called alleles. Previously, when the StrandAlleleCountsBySample (SAC) annotation was present in GVCFs, GenotypeGVCFs carried it over to the final VCF essentially unchanged. This was problematic because SAC includes the counts for all alleles originally present (including NON-REF) even when some are not called in the final VCF. When the full list of original alleles is no longer available, parsing SAC could become difficult if not impossible.
Added new MQ jittering functionality to improve how VQSR handles MQ. Note that HaplotypeCaller now calculates a new annotation called RAW_MQ per-sample, which is then integrated per-cohort by GenotypeGVCFs to produce the MQ annotation.
VariantAnnotator can now annotate FILTER field from an external resource. Usage:
--resource:foo resource.vcf --expression foo.FILTER
VariantAnnotator can now check allele concordance when annotating with an external resource. Usage:
Allowed overriding hard-coded cutoff for allele length in ValidateVariants and in LeftAlignAndTrimVariants. Usage:
--reference_window_stop N where N is the desired cutoff.
Also in LeftAlignAndTrimVariants, trimming multiallelic alleles is now the default behavior.
Fixed ability to mask out snps with
--snpmask in FastaAlternateReferenceMaker.
Also in FastaAlternateReferenceMaker, fixed merging of contiguous intervals properly, and made the tool produce more informative contig names.
Fixed a bug in CombineVariants that occurred when one record has a spanning deletion and needs a padded reference allele.
Added a new VariantEval evaluation module, MetricsCollection, that summarizes metrics from several EV modules.
Enabled family-level stratification in MendelianViolationEvaluator of VariantEval (if a ped file is provided), making it possible to count Mendelian violations for each family in a callset with multiple families.
Added the ability to SelectVariants to enforce 4.2 version output of the VCF spec when processing older files. Use case: the 4.2 spec specifies that GQ must be an integer; by default we don’t enforce it (so if reading an older file that used decimals, we don’t change it) but the new argument
--forceValidOutput converts the values on request. Not made default because of some performance slowdown -- so writing VCFs is now fast by default, compliant by choice.
-U ALLOW_SEQ_DICT_INCOMPATIBILITYcan be used (with caution) to override the check.
Various improvements to the tools’ performance, especially HaplotypeCaller, by making the code more efficient and cutting out crud.
GenotypeGVCFs now emits a no-call (./.) when the evidence is too ambiguous to make a call at all (e.g. all the PLs are zero). Previously this would have led to a hom-ref call with RGQ=0.
Fixed a bug in GenotypeGVCFs that sometimes generated invalid VCFs for haploid callsets. The tool was carrying over the AD from alleles that had been trimmed out, causing field length mismatches.
Ensured inputPriors get used if they are specified to the genotyper (previously they were ignored). Also improved docs on
--indel_ heterozygosity priors.
Fixed bug that affected the
--ignoreInputSamples behavior of CalculateGenotypePosteriors.
Added option to OverclippedReadFilter to not require soft-clips on both ends. Contributed by Jacob Silterra.
Fixed a bug in IndelRealigner where the tool was incorrectly "fixing" mates when supplementary alignments are present. The patch involves ignoring supplementary alignments.
Support for reading and writing CRAM files. Some improvements are still expected in htsjdk. Contributed by Vadim Zalunin at EBI and collaborators at the Sanger Institute.
Made interval-list output format dependent on the file extension (for RealignerTargetCreator). If the extension is
.interval_list, output will be formatted as a proper Picard interval list (with sequence dictionary). Otherwise it will be a basic GATK interval list as previously.
Added a new JobRunner called ParallelShell that will run jobs locally on one node concurrently as specified by the DAG, with the option to limit the maximum number of concurrently running jobs using the flag
maximumNumberOfJobsToRunConcurrently. Contributed by Johan Dahlberg.
PER_TARGET_COVERAGEargument and added extension for Picard CollectWgsMetrics.
GATK 3.4 was released on May 15, 2015. Itemized changes are listed below. For more details, see the user-friendly version highlights.
--mergeVariantsViaLDargument in HaplotypeCaller since it didn’t work. To merge complex substitutions, use ReadBackedPhasing as a post-processing step.
allowNonUniqueKmersInRefso that it applies to all kmer sizes. This resolves some assembly issues in low-complexity sequence contexts and improves calling sensitivity in those regions.
.g.vcffile extension. See Highlights for more details.
-uniquifySamplesto GenotypeGVCFs to make it possible to genotype together two different datasets containing the same sample.
-dcovsetting for HaplotypeCaller (pending a fix to the downsampling control system) to prevent buggy behavior. See Highlights for more details.
--breakBandsAtMultiplesOf Nwill ensure that no reference blocks span across genomic positions that are multiples of N. This is especially important in the case of scatter-gather where you don't want your scatter intervals to start in the middle of blocks (because of a limitation in the way
-Lworks in the GATK for VCF records with the END tag). See Highlights for more details.
-trimargument to trim (simplify) alleles to a minimal representation.
-trimAlternatesargument to remove all unused alternate alleles from variants. Note that this is pretty aggressive for monomorphic sites.
-noTrimargument to preserve original alleles.
-fixNDNflag fully functional.
-SMAis specified. Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines. Otherwise, the entire list is output with each field.
-drfargument to disable default read filters. Limited to specific tools and specific filters (currently only DuplicateReadFilter).
-qsub-broadargument. When -qsub-broad is specified instead of
-qsub, Queue will use the
h_vmemparameter instead of
h_rssto specify memory limit requests. This was done to accommodate changes to the Broad’s internal job scheduler. Also causes the GridEngine native arguments to be output by default to the logger, instead of only when in debug mode.
slf4j-log4j12version (contributed by user Biocyberman).
GATK 3.3 was released on October 23, 2014. Itemized changes are listed below. For more details, see the user-friendly version highlights.
--sample_nameargument. This is a shortcut for people who have multi-sample BAMs but would like to use
-ERC GVCFmode with a particular one of those samples.
--ignore_all_filtersoption. If specified, the variant recalibrator will ignore all input filters and treat sites as unfiltered.
--keepOriginalAC functionalityin SelectVariants to work for sites that lose alleles in the selection.
read_grouparguments no longer appear in the header.
--bcffor VCF files, and
--generate_md5for BAM files moved to the engine level.
GATK 3.2 was released on July 14, 2014. Itemized changes are listed below. For more details, see the user-friendly version highlights.
We also want to take this opportunity to thank super-user Phillip Dexheimer for all of his excellent contributions to the codebase, especially for this release.
optfunctions to work with recent versions of the ggplot2 R library.
optfunctions to work with recent versions of the ggplot2 R library.
GATK 3.1 was released on March 18, 2014. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
--pair_hmm_implementation VECTOR_LOGLESS_CACHING. Please see the 3.1 Version Highlights for more details about expected speed ups and some background on the collaboration that made these possible.
GATK 3.0 was released on March 5, 2014. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
One important change for those who prefer to build from source is that we now use maven instead of ant. See the relevant documentation for building the GATK with our new build system.
GATK 2.8 was released on December 6, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
Note that this release is relatively smaller than previous ones. We are working hard on some new tools and frameworks that we are hoping to make available to everyone for our next release.
GATK 2.7 was released on August 21, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
GATK 2.6 was released on June 20, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
Important note: with this release the GATK has officially moved to using Java 7.
GATK 2.5 was released on April 30, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
java -cp GenomeAnalysisTK.jar org.broadinstitute.sting.tools.ListAnnotations
GATK 2.4 was released on February 26, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
Important note 1 for this release: with this release comes an updated licensing structure for the GATK. Different files in our public repository are protected with different licenses, so please see the text at the top of any given file for details as to its particular license.
Important note 2 for this release: the GATK team spent a tremendous amount of time and engineering effort to add extensive tests for many of our core tools (a process that will continue into future releases). Unsurprisingly, as part of this process many small (and some not so small) bugs were uncovered during testing that we subsequently fixed. While we usually attempt to enumerate in our release notes all of the bugs fixed during a given release, that would entail quite a Herculean effort for release 2.4; so please just be aware that there were many smaller fixes that may be omitted from these notes.
GATK 2.3 was released on December 17, 2012. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
GATK release 2.2 was released on October 31, 2012. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
The GATK 2.0 release includes both the addition of brand-new (and often still experimental) tools and updates to the existing stable tools.