# Tagged with #compression 0 documentation articles | 0 announcements | 3 forum discussions

No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2015-04-30 19:31:03 | Updated 2015-04-30 19:31:51 | Tags: bcf2codec compression

GATK Team,

I have recently started to look into using bgzipped BCF files as our primary means of input/output to GATK in order to save time parsing the VCF files. Unfortunately, due to space limitations, unzipped BCF files are not an option, as it looks like they're ~8x the size of a bgzipped VCF.

When I ran a simple "round trip" to convert vcf.gz -> bcf.gz -> vcf.gz (using SelectVariants) just to test the potential processing gains, I got the following error on the bcf.gz->vcf.gz leg:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version nightly-2015-04-30-gdd4ddcb):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Tabix indexed files only work with ASCII codecs, but received non-Ascii codec BCF2Codec, for input source: myFile.bcf.gz
##### ERROR ------------------------------------------------------------------------------------------

This issue persists with the nightly build as well.

Is the native reading of bcf.gz files something that is on the horizon for the GATK team, or is it still a long way off? It looks like this code is pretty deep in the htsjdk library, and fixing it may require a change to the class hierarchy.

Thanks,

John Wallace

Created 2014-12-26 21:09:34 | Updated | Tags: indelrealigner compression

Hi GATK team, my jobs are currently running and I'm a little bit lazy to try this later: I saw that the .interval files produced by RealignerTargetCreator can be quite large. Can I use a ".interval.gz" extension on the command line of RealignerTargetCreator ? Can I use this *.gz file with IndelRealigner ?

Created 2013-03-19 20:08:23 | Updated | Tags: parallel blip compression exception

Trying to run

java -jar $GATKJAR -R$REF -T UnifiedGenotyper -I file1.bam -I file2.bam -I file3.bam -glm BOTH -o output.vcf.gz

gives an error like:

 ##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.4-9-g532efad):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR ------------------------------------------------------------------------------------------
However, if I specify -o output.vcf instead of -o output.vcf.gz, then everything works. I suspect the problem is with the autodetection of the codec. In VariantContextWriterStorage, LocalParallelizationProblem is thrown not only if the tmp file cannot be found, but whenever a FeatureDescriptor cannot be found for the file.