Dear all,

I have discussed with the GATK team a small issue related to the reduceReads compression, which has been fixed just around the end of November. This was 8-9 days ago and I have been actively reprocessing a large number of BAM files to create new reducedReads BAMs. It all works fine now but it's not convenient to cite a nightly release version in publications. Therefore I would like to move to 2.8.1. Now it would make my life so much easier if I could do this without re-reducing again the just fixed BAM files.

Based on the release notes, and given the very short time frame, am I right to assume that 2.8.1 is equivalent to 2.7.4 (the nightly version following the bug fix , dated November 29) as far as reducedReads compression is concerned? Has anything important happened over the last 10 days that would not have been specified in the release notes?

Hi, we have been using GATK for quite a while now and we love how the reduceRead process helps us to reduce the file size of the bam file for better downstream analysis. The size reduction makes our server administrator really happy and it seems like a good idea to store the reduced bam file instead of the original file considering most of the required information should still be presented within the bam file. However, is there anyway for us to revert the reduced bam file back to the original bam file? That will be extremely helpful as the storage space is usually limited on the server

Hello, I am using the latest GATK Unified Genotyper (UG) software for my BWA aligned reads (paired-end). 1. BWA: default parameters 2. markDuplicates (PICARD) and realignment (GATK) 3. UnifiedGenotyper default values except: stand_call_conf 30.0, stand_emit_conf 10.0 When I use the ReducedBam with UG, I get 2,247,468 SNPs When I use the Bam without ReducedReads UG gives me 2,245,966 SNPs

I used BEDTOOLS to compare both files: 2,229,901 shared SNPs 17,567 only identified with the ReducedReads Bam 16,065 only identified with the non ReducedReads Bam

Do you have an idea, what happend here? Many thanks in advance

I am running the ReduceReads (GATK version 2.3-9-ge5ebf34) on 14 bam files by chromosome. It runs fine on chr21, chr22, X, Y, and MT. Then it runs out of heap on the others(which is at 32G). Is there a memory leak somewhere? Or should I increase the amount of memory?

java -Xmx32g -Djava.io.tmpdir=/scratch -jar $GATK_PREFIX/GenomeAnalysisTK.jar \ -R $REF \ -T ReduceReads \ -I recal_bams.list \ -L $CHR \ -o reduced/saudi_arabian.$CHR.reduced.bam

INFO 20:49:41,838 ProgressMeter - chr5:44828772 2.54e+08 28.7 h 6.8 m 24.8% 4.8 d 87.0 h INFO 20:51:22,265 ProgressMeter - chr5:44851724 2.54e+08 28.7 h 6.8 m 24.8% 4.8 d 87.1 h INFO 20:53:07,958 ProgressMeter - chr5:44851724 2.54e+08 28.7 h 6.8 m 24.8% 4.8 d 87.1 h INFO 20:56:21,576 ProgressMeter - chr5:44851724 2.54e+08 28.8 h 6.8 m 24.8% 4.8 d 87.3 h INFO 20:59:56,672 ProgressMeter - chr5:44851724 2.54e+08 28.8 h 6.8 m 24.8% 4.8 d 87.5 h INFO 21:06:46,367 ProgressMeter - chr5:44851724 2.54e+08 29.0 h 6.8 m 24.8% 4.9 d 87.8 h INFO 21:20:10,834 ProgressMeter - chr5:44851724 2.54e+08 29.1 h 6.9 m 24.8% 4.9 d 88.3 h Exception in thread "ProgressMeterDaemon" ##### ERROR ------------------------------------------------------------------------------------------ java.lang.OutOfMemoryError: Java heap space at java.lang.StringBuilder.toString(StringBuilder.java:405) at org.broadinstitute.sting.utils.AutoFormattingTime.(AutoFormattingTime.java:17) at org.broadinstitute.sting.utils.AutoFormattingTime.(AutoFormattingTime.java:21) at org.broadinstitute.sting.utils.AutoFormattingTime.(AutoFormattingTime.java:25) at org.broadinstitute.sting.utils.progressmeter.ProgressMeter.printProgress(ProgressMeter.java:259) at org.broadinstitute.sting.utils.progressmeter.ProgressMeterDaemon.run(ProgressMeterDaemon.java:52)

The BAM files are generated with a non BWA aligner and have been processed following the GATK pipeline Indel realignment, BQSR.

java -jar GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -R /data/seq/indexed-genomes/bos_taurus/umd31MT/umd31MT.fa -T ReduceReads -I /data/seq/chhar0/phd/GATK/Chr29-realigned-recal.bam -o Chr29-GATK-reduced.bam

INFO 20:21:36,082 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:21:36,084 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-5-g49ed93c, Compiled 2013/01/06 20:58:13 INFO 20:21:36,084 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 20:21:36,084 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 20:21:36,088 HelpFormatter - Program Args: -R /data/seq/indexed-genomes/bos_taurus/umd31MT/umd31MT.fa -T ReduceReads -I /data/seq/chhar0/phd/GATK/Chr29-realigned-recal.bam -o Chr29-GATK-reduced.bam INFO 20:21:36,089 HelpFormatter - Date/Time: 2013/01/07 20:21:36 INFO 20:21:36,089 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:21:36,089 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:21:36,160 GenomeAnalysisEngine - Strictness is SILENT INFO 20:21:36,248 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 20:21:36,255 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 20:21:36,449 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.19 INFO 20:21:36,499 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 20:21:36,499 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining INFO 20:21:36,671 ReadShardBalancer$1 - Loading BAM index data for next contig INFO 20:21:36,674 ReadShardBalancer$1 - Done loading BAM index data for next contig INFO 20:22:06,502 ProgressMeter - Chr29:33746 6.91e+05 30.0 s 43.4 s 92.5% 32.4 s 2.4 s INFO 20:22:37,291 ProgressMeter - Chr29:65146 1.34e+06 60.8 s 45.3 s 92.5% 65.7 s 4.9 s INFO 20:23:08,090 ProgressMeter - Chr29:91651 1.95e+06 91.6 s 46.9 s 92.5% 99.0 s 7.5 s INFO 20:23:43,254 ProgressMeter - Chr29:124049 2.64e+06 2.1 m 47.9 s 92.5% 2.3 m 10.3 s INFO 20:23:45,243 GATKRunReport - Uploaded run statistics report to AWS S3

java.lang.IndexOutOfBoundsException: Index: 15, Size: 15 at java.util.LinkedList.entry(LinkedList.java:382) at java.util.LinkedList.get(LinkedList.java:332) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.compressVariantRegion(SlidingWindow.java:597) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.closeVariantRegion(SlidingWindow.java:623) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SlidingWindow.closeVariantRegions(SlidingWindow.java:643) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SingleSampleCompressor.closeVariantRegions(SingleSampleCompressor.java:83) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.closeVariantRegionsInAllSamples(MultiSampleCompressor.java:94) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.addAlignment(MultiSampleCompressor.java:76) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReadsStash.compress(ReduceReadsStash.java:67) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:387) at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:87) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:226) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:215) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:254) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

