Release notes for GATK version 2.3
Posted in Announcements | Last updated on 2012-12-18 20:21:23
GATK 2.3 was released on December 17, 2012. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
Base Quality Score Recalibration
- Soft clipped bases are no longer counted in the delocalized BQSR.
- The user can now set the maximum allowable cycle with the --maximum_cycle_value argument.
Unified Genotyper
- Minor (5%) run time improvements to the Unified Genotyper.
- Fixed bug for the indel model that occurred when long reads (e.g. Sanger) in a pileup led to a read starting after the haplotype.
- Fixed bug in the exact AF calculation where log10pNonRefByAllele should really be log10pRefByAllele.
Haplotype Caller
- Fixed the performance of GENOTYPE_GIVEN_ALLELES mode, which often produced incorrect output when passed complex events.
- Fixed the interaction with the allele biased downsampling (for contamination removal) so that the removed reads are not used for downstream annotations.
- Implemented minor (5-10%) run time improvements to the Haplotype Caller.
- Fixed the logic for determining active regions, which was a bit broken when intervals were used in the system.
Variant Annotator
- The FisherStrand annotation ignores reduced reads (because they are always on the forward strand).
- Can now be run multi-threaded with -nt argument.
Reduce Reads
- Fixed bug where sometime the start position of a reduced read was less than 1.
- ReduceReads now co-reduces bams if they're passed in toghether with multiple -I.
Combine Variants
- Fixed the case where the PRIORITIZE option is used but no priority list is given.
Phase By Transmission
- Fixed bug where the AD wasn't being printed correctly in the MV output file.
Miscellaneous
- A brand new version of the per site down-sampling functionality has been implemented that works much, much better than the previous version.
- More efficient initial file seeking at the beginning of the GATK traversal.
- Fixed the compression of VCF.gz where the output was too big because of unnecessary call to flush().
- The allele biased downsampling (for contamination removal) has been rewritten to be smarter; also, it no longer aborts if there's a reduced read in the pileup.
- Added a major performance improvement to the GATK engine that stemmed from a problem with the NanoSchedule timing code.
- Added checking in the GATK for mis-encoded quality scores.
- Fixed downsampling in the ReadBackedPileup class.
- Fixed the parsing of genome locations that contain colons in the contig names (which is allowed by the spec).
- Made ID an allowable INFO field key in our VCF parsing.
- Multi-threaded VCF to BCF writing no longer produces an invalid intermediate file that fails on merging.
- Picard jar remains at version 1.67.1197.
- Tribble jar updated to version 119.