Release notes for GATK version 2.1
Posted in Announcements | Last updated on 2012-08-23 14:11:29
Base Quality Score Recalibration
- Multi-threaded support in the BaseRecalibrator tool has been temporarily suspended for performance reasons; we hope to have this fixed for the next release.
- Implemented support for SOLiD no call strategies other than throwing an exception.
- Fixed smoothing in the BQSR bins.
- Fixed plotting R script to be compatible with newer versions of R and ggplot2 library.
- Renamed the per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarified the description in the VCF header.
- UG now makes use of base insertion and base deletion quality scores if they exist in the reads (output from BaseRecalibrator).
- Changed the -maxAlleles argument to -maxAltAlleles to make it more accurate.
- In pooled mode, if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) then do not try to genotype.
- Added improvements to indel calling in pooled mode: we compute per-read likelihoods in reference sample to determine whether a read is informative or not.
- Added LowQual filter to the output when appropriate.
- Added some support for calling on Reduced Reads. Note that this is still experimental and may not always work well.
- Now does a better job of capturing low frequency branches that are inside high frequency haplotypes.
- Updated VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller.
- Made fixes to the likelihood based LD calculation for deciding when to combine consecutive events.
- Fixed bug where non-standard bases from the reference would cause errors.
- Better separation of arguments that are relevant to the Unified Genotyper but not the Haplotype Caller.
- Fixed bug where reads were soft-clipped beyond the limits of the contig and the tool was failing with a NoSuchElement exception.
- Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out.
- Fixed a bug where downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception.
- Fixed support in the AlleleCount stratification when using the MLEAC (it is now capped by the AN).
- Fixed incorrect allele counting in IndelSummary evaluation.
- Now outputs the first non-MISSING QUAL, instead of the maximum.
- Now supports multi-threaded running (with the -nt argument).
- Fixed behavior of the --regenotype argument to do proper selecting (without losing any of the alternate alleles).
- No longer adds the DP INFO annotation if DP wasn't used in the input VCF.
- If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the output VC (since they are no longer accurate).
- Updated and improved the BadCigar read filter.
- GATK now generates a proper error when a gzipped FASTA is passed in.
- Various improvements throughout the BCF2-related code.
- Removed various parallelism bottlenecks in the GATK.
- Added support of X and = CIGAR operators to the GATK.
- Catch NumberFormatExceptions when parsing the VCF POS field.
- Fixed bug in FastaAlternateReferenceMaker when input VCF has overlapping deletions.
- Fixed AlignmentUtils bug for handling Ns in the CIGAR string.
- We now allow lower-case bases in the REF/ALT alleles of a VCF and upper-case them.
- Added support for handling complex events in ValidateVariants.
- Picard jar remains at version 1.67.1197.
- Tribble jar remains at version 110.