Release notes for GATK version 2.3
Posted in Announcements | Last updated on 2012-12-18 20:21:23


Comments (2)

GATK 2.3 was released on December 17, 2012. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history

Base Quality Score Recalibration

  • Soft clipped bases are no longer counted in the delocalized BQSR.
  • The user can now set the maximum allowable cycle with the --maximum_cycle_value argument.

Unified Genotyper

  • Minor (5%) run time improvements to the Unified Genotyper.
  • Fixed bug for the indel model that occurred when long reads (e.g. Sanger) in a pileup led to a read starting after the haplotype.
  • Fixed bug in the exact AF calculation where log10pNonRefByAllele should really be log10pRefByAllele.

Haplotype Caller

  • Fixed the performance of GENOTYPE_GIVEN_ALLELES mode, which often produced incorrect output when passed complex events.
  • Fixed the interaction with the allele biased downsampling (for contamination removal) so that the removed reads are not used for downstream annotations.
  • Implemented minor (5-10%) run time improvements to the Haplotype Caller.
  • Fixed the logic for determining active regions, which was a bit broken when intervals were used in the system.

Variant Annotator

  • The FisherStrand annotation ignores reduced reads (because they are always on the forward strand).
  • Can now be run multi-threaded with -nt argument.

Reduce Reads

  • Fixed bug where sometime the start position of a reduced read was less than 1.
  • ReduceReads now co-reduces bams if they're passed in toghether with multiple -I.

Combine Variants

  • Fixed the case where the PRIORITIZE option is used but no priority list is given.

Phase By Transmission

  • Fixed bug where the AD wasn't being printed correctly in the MV output file.

Miscellaneous

  • A brand new version of the per site down-sampling functionality has been implemented that works much, much better than the previous version.
  • More efficient initial file seeking at the beginning of the GATK traversal.
  • Fixed the compression of VCF.gz where the output was too big because of unnecessary call to flush().
  • The allele biased downsampling (for contamination removal) has been rewritten to be smarter; also, it no longer aborts if there's a reduced read in the pileup.
  • Added a major performance improvement to the GATK engine that stemmed from a problem with the NanoSchedule timing code.
  • Added checking in the GATK for mis-encoded quality scores.
  • Fixed downsampling in the ReadBackedPileup class.
  • Fixed the parsing of genome locations that contain colons in the contig names (which is allowed by the spec).
  • Made ID an allowable INFO field key in our VCF parsing.
  • Multi-threaded VCF to BCF writing no longer produces an invalid intermediate file that fails on merging.
  • Picard jar remains at version 1.67.1197.
  • Tribble jar updated to version 119.

Return to top Comment on this article in the forum