# Tagged with #compression 1 documentation article | 0 announcements | 2 forum discussions

#### Objective

Compress the read data in order to minimize file sizes, which facilitates massively multisample processing.

• TBD

### 1. Compress your sequence data

#### Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \
-R reference.fa \
-L 20 \


#### Expected Result

This creates a file called reduced_reads.bam containing only the sequence information that is essential for calling variants.

Note that ReduceReads is not meant to be run on multiple samples at once. If you plan on merging your sample bam files, you should run ReduceReads on individual samples before doing so.

No posts found with the requested search criteria.

Hi GATK team, my jobs are currently running and I'm a little bit lazy to try this later: I saw that the .interval files produced by RealignerTargetCreator can be quite large. Can I use a ".interval.gz" extension on the command line of RealignerTargetCreator ? Can I use this *.gz file with IndelRealigner ?

Trying to run

java -jar $GATKJAR -R$REF -T UnifiedGenotyper -I file1.bam -I file2.bam -I file3.bam -glm BOTH -o output.vcf.gz


gives an error like:

 ##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.4-9-g532efad):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.

However, if I specify -o output.vcf instead of -o output.vcf.gz, then everything works. I suspect the problem is with the autodetection of the codec. In VariantContextWriterStorage, LocalParallelizationProblem is thrown not only if the tmp file cannot be found, but whenever a FeatureDescriptor cannot be found for the file.