Release notes for GATK version 2.0
Posted in Announcements | Last updated on 2012-08-10 00:07:47


Comments (0)

The GATK 2.0 release includes both the addition of brand-new (and often still experimental) tools and updates to the existing stable tools.

New Tools

  • Base Recalibrator (BQSR v2), an upgrade to CountCovariates/TableRecalibration that generates base substitution, insertion, and deletion error models.
  • Reduce Reads, a BAM compression algorithm that reduces file sizes by 20x-100x while preserving all information necessary for accurate SNP and indel calling. ReduceReads enables the GATK to call tens of thousands of deeply sequenced NGS samples simultaneously.
  • HaplotypeCaller, a multi-sample local de novo assembly and integrated SNP, indel, and short SV caller.
  • Plus powerful extensions to the Unified Genotyper to support variant calling of pooled samples, mitochondrial DNA, and non-diploid organisms. Additionally, the extended Unified Genotyper introduces a novel error modeling approach that uses a reference sample to build a site-specific error model for SNPs and indels that vastly improves calling accuracy.

Base Quality Score Recalibration

  • IMPORTANT: the Count Covariates and Table Recalibration tools (which comprise BQSRv1) have been retired! Please see the BaseRecalibrator tool (BQSRv2) for running recalibration with GATK 2.0.

Unified Genotyper

  • Handle exception generated when non-standard reference bases are present in the fasta.
  • Bug fix for indels: when checking the limits of a read to clip, it wasn't considering reads that may already have been clipped before.
  • Now emits the MLE AC and AF in the INFO field.
  • Don't allow N's in insertions when discovering indels.

Phase By Transmission

  • Multi-allelic sites are now correctly ignored.
  • Reporting of mendelian violations is enhanced.
  • Corrected TP overflow.
  • Fixed bug that arose when no PLs were present.
  • Added option to output the father's allele first in phased child haplotypes.
  • Fixed a bug that caused the wrong phasing of child/father pairs.

Variant Eval

  • Improvements to the validation report module: if eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status.
  • If present, the AlleleCount stratification uses the MLE AC by default (and otherwise drops down to use the greedy AC).
  • Fixed bugs in the VariantType and IndelSize stratifications.

Variant Annotator

  • FisherStrand annotation no longer hard-codes in filters for bases/reads (previously used MAPQ > 20 && QUAL > 20).
  • Miscellaneous bug fixes to experimental annotations.
  • Added a Clipping Rank Sum Test to detect when variants are present on reads with differential clipping.
  • Fixed the ReadPos Rank Sum Test annotation so that it no longer uses the un-hardclipped start as the alignment start.
  • Fixed bug in the NBaseCount annotation module.
  • The new TandemRepeatAnnotator is now a standard annotation while HRun has been retired.
  • Added PED support for the Inbreeding Coefficient annotation.
  • Don't compute QD if there is no QUAL.

Variant Quality Score Recalibration

  • The VCF index is now created automatically for the recalFile.

Variant Filtration

  • Now allows you to run with type unsafe JEXL selects, which all default to false when matching.

Select Variants

  • Added an option which allows the user to re-genotype through the exact AF calculation model (if PLs are present) in order to recalculate the QUAL and genotypes.

Combine Variants

  • Added --mergeInfoWithMaxAC argument to keep info fields from the input with the highest AC value.

Somatic Indel Detector

  • GT header line is now output.

Indel Realigner

  • Automatically skips Ion reads just like it does with 454 reads.

Variants To Table

  • Genotype-level fields can now be specified.
  • Added the --moltenize argument to produce molten output of the data.

Depth Of Coverage

  • Fixed a NullPointerException that could occur if the user requested an interval summary but never provided a -L argument.

Miscellaneous

  • BCF2 support in tools that output VCFs (use the .bcf extension).
  • The GATK Engine no longer automatically strips the suffix "Walker" after the end of tool names; as such, all tools whose name ended with "Walker" have been renamed without that suffix.
  • Fixed bug when specifying a JEXL expression for a field that doesn't exist: we now treat the whole expression as false (whereas we were rethrowing the JEXL exception previously).
  • There is now a global --interval_padding argument that specifies how many basepairs to add to each of the intervals provided with -L (on both ends).
  • Removed all code associated with extended events.
  • Algorithmically faster version of DiffEngine.
  • Better down-sampling fixes edge case conditions that used to be handled poorly. Read Walkers can now use down-sampling.
  • GQ is now emitted as an int, not a float.
  • Fixed bug in the Beagle codec that was skipping the first line of the file when decoding.
  • Fixed bug in the VCF writer in the case where there are no genotypes for a record but there are genotypes in the header.
  • Miscellaneous fixes to the VCF headers being produced.
  • Fixed up the BadCigar read filter.
  • Removed the old deprecated genotyping framework revolving around the misordering of alleles.
  • Extensive refactoring of the GATKReports.
  • Picard jar updated to version 1.67.1197.
  • Tribble jar updated to version 110.

Return to top Comment on this article in the forum