# Tagged with #compression 1 documentation article | 0 announcements | 2 forum discussions

#### Objective

Compress the read data in order to minimize file sizes, which facilitates massively multisample processing.

• TBD

### 1. Compress your sequence data

#### Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-R reference.fa \
-L 20 \


#### Expected Result

This creates a file called reduced_reads.bam containing only the sequence information that is essential for calling variants.

Note that ReduceReads is not meant to be run on multiple samples at once. If you plan on merging your sample bam files, you should run ReduceReads on individual samples before doing so.

No posts found with the requested search criteria.

Hi GATK team, my jobs are currently running and I'm a little bit lazy to try this later: I saw that the .interval files produced by RealignerTargetCreator can be quite large. Can I use a ".interval.gz" extension on the command line of RealignerTargetCreator ? Can I use this *.gz file with IndelRealigner ?

Trying to run

java -jar $GATKJAR -R$REF -T UnifiedGenotyper -I file1.bam -I file2.bam -I file3.bam -glm BOTH -o output.vcf.gz


gives an error like:

 ##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.4-9-g532efad):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR
##### ERROR MESSAGE: There was a failure because temporary file /tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub1033673347640679118.tmp could not be found while running the GATK with more than one thread.  Possible causes for this problem include: your system's open file handle limit is too small, your output or temp directories do not have sufficient space, or just an isolated file system blip
##### ERROR ------------------------------------------------------------------------------------------


The file is actually there, and is gzip-compressed and vcf-formatted.

However, if I specify -o output.vcf instead of -o output.vcf.gz, then everything works. I suspect the problem is with the autodetection of the codec. In VariantContextWriterStorage, LocalParallelizationProblem is thrown not only if the tmp file cannot be found, but whenever a FeatureDescriptor cannot be found for the file.

So... It seems like compressed output cannot be used from threaded processing with UnifiedGenotyper. Is my assessment correct?

1. A better error message would be helpful to prevent others from trying the same thing I did.
2. It would be nice to be able to write compressed output from a threaded UnifiedGenotyper, perhaps: a) the temp file could be written uncompressed even though the final file will be compressed, or b) the Codec-detection could detect gzip-compressed files?