Tagged with #catvariants
0 documentation articles | 1 announcement | 7 forum discussions

No posts found with the requested search criteria.

Created 2014-07-15 03:54:06 | Updated 2014-10-23 17:58:36 | Tags: variantrecalibrator haplotypecaller selectvariants variantannotator release-notes catvariants genotypegvcfs gatk-3-2
Comments (13)

GATK 3.2 was released on July 14, 2014. Itemized changes are listed below. For more details, see the user-friendly version highlights.

We also want to take this opportunity to thank super-user Phillip Dexheimer for all of his excellent contributions to the codebase, especially for this release.

Haplotype Caller

  • Various improvements were made to the assembly engine and likelihood calculation, which leads to more accurate genotype likelihoods (and hence better genotypes).
  • Reads are now realigned to the most likely haplotype before being used by the annotations, so AD and DP will now correspond directly to the reads that were used to generate the likelihoods.
  • The caller is now more conservative in low complexity regions, which significantly reduces false positive indels at the expense of a little sensitivity; mostly relevant for whole genome calling.
  • Small performance optimizations to the function to calculate the log of exponentials and to the Smith-Waterman code (thanks to Nigel Delaney).
  • Fixed small bug where indel discovery was inconsistent based on the active-region size.
  • Removed scary warning messages for "VectorPairHMM".
  • Made VECTOR_LOGLESS_CACHING the default implementation for PairHMM.
  • When we subset PLs because alleles are removed during genotyping we now also subset the AD.
  • Fixed bug where reference sample depth was dropped in the DP annotation.

Variant Recalibrator

  • The -mode argument is now required.
  • The plotting script now uses the theme instead of opt functions to work with recent versions of the ggplot2 R library.


  • The plotting script now uses the theme instead of opt functions to work with recent versions of the ggplot2 R library.

Variant Annotator

  • SB tables are created even if the ref or alt columns have no counts (used in the FS and SOR annotations).

Genotype GVCFs

  • Added missing arguments so that now it models more closely what's available in the Haplotype Caller.
  • Fixed recurring error about missing PLs.
  • No longer pulls the headers from all input rods including dbSNP, rather just from the input variants.
  • --includeNonVariantSites should now be working.

Select Variants

  • The dreaded "Invalid JEXL expression detected" error is now a kinder user error.

Indel Realigner

  • Now throws a user error when it encounters reads with I operators greater than the number of read bases.
  • Fixed bug where reads that are all insertions (e.g. 50I) were causing it to fail.


  • Now computes posterior probabilities only for SNP sites with SNP priors (other sites have flat priors applied).
  • Now computes genotype posteriors using likelihoods from all members of the trio.
  • Added annotations for calling potential de novo mutations.
  • Now uses PP tag instead of GP tag because posteriors are Phred-scaled.

Cat Variants

  • Can now process .list files with -V.
  • Can now handle BCF and Block-Compressed VCF files.

Validate Variants

  • Now works with gVCF files.
  • By default, all strict validations are performed; use --validationTypeToExclude to exclude specific tests.


  • Now use '--use_IUPAC_sample sample_name' to specify which sample's genotypes should be used for the IUPAC encoding with multi-sample VCF files.


  • Refactored maven directories and java packages replacing "sting" with "gatk".
  • Extended on-the-fly sample renaming feature to VCFs with the --sample_rename_mapping_file argument.
  • Added a new read transformer that refactors NDN cigar elements to one N element.
  • Now a Tabix index is created for block-compressed output formats.
  • Switched outputRoot in SplitSamFile to an empty string instead of null (thanks to Carlos Barroto).
  • Enabled the AB annotation in the reference model pipeline (thanks to John Wallace).
  • We now check that output files are specified in a writeable location.
  • We now allow blank lines in a (non-BAM) list file.
  • Added legibility improvements to the Progress Meter.
  • Allow for non-tab whitespace in sample names when performing on-the-fly sample-renaming (thanks to Mike McCowan).
  • Made IntervalSharder respect the IntervalMergingRule specified on the command line.
  • Sam, tribble, and variant jars updated to version 1.109.1722; htsjdk updated to version 1.112.1452.

Created 2015-07-09 14:44:56 | Updated | Tags: catvariants
Comments (1)

GATK Team,

When using CatVariants on my data, I noticed that it was consistently mis-sorting the output (when not using --assumeSorted; in my case, using the --assumeSorted is not feasible as the input VCFs are not ordered). Looking at the code, it appears that CatVariants is sorting the output files based on the position of the first variant in the VCF file, which can potentially lead to strange behavior. As an example, assume we have the following 3 VCF files to concatenate:



1 10 ```



1 30 ```



2 20 ```

Without the --assumeSorted option, CatVariants would output



1 10 2 20 1 30 ```

I've attempted a fix for the issue by using the VariantContext of the first variant instead of the position (null VariantContext in the case of assumeSorted) here: https://github.com/broadgsa/gatk-protected/pull/13.

Created 2015-02-21 22:46:50 | Updated 2015-02-21 22:48:08 | Tags: combinevariants catvariants combinegvcfs gatk-best-practices
Comments (6)

Currently I am following GATK best practice for using HC 3.0+, however I'm splitting my calls to chromosomal regions (-L). Next are the following step I perform working up to GenotypeGVCF and my question.

1 - I use CatVariants (following HC) to merge all 25 chromosome gvcf files into a single gvcf file per individual.
2 - I use CombineGVCF to merge 2 .. n number of individuals together. This is done because some analysis have 300+ individuals. 3- I then use CombineGVCF again to merge all the file from step 2 into one large gvcf file for one large joint GenotypeGVCF step. 4 - GenotypeGVCF is done again based on chromosomal regions (-L), which is followed by a additional CatVariants before VQSR.

The question I have this this: Given the size of the analysis I have noticed that my CombineGVCF done in step 3 can take anywhere from 4-8 hours. I was wondering if I could change this step to use CombineVariants and have the result be the same (unlost data). The main reason for this would be because GATK currently allow CombineVariants to use the -nt option.

Thanks for you time and work.


Created 2014-07-16 17:58:26 | Updated | Tags: catvariants
Comments (7)

Hi, I am trying to concatenate two vcf files with different chr variants and after checking the webpage http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_tools_CatVariants.html

I ran the command line with the following error

Error: Could not find or load main class org.broadinstitute.gatk.tools.CatVariants

Can you help me resolve it?


Created 2014-04-11 04:25:31 | Updated | Tags: commandlinegatk catvariants
Comments (2)

Using GATK on command-line the CatVariants command fails.

Program version: GATK 3.1-1-g07a4bf8.

ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: CatVariants

Code to invoke:

java -jar GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar -T CatVariant -R file.fasta

Note that in the current documentation for CatVariants the example lists the name as 'org.broadinstitute.sting.tools.CatVariants' rather than just CatVariants. Trying the listed string fails with the same error.

Created 2014-04-07 15:20:23 | Updated 2014-04-07 14:46:43 | Tags: catvariants
Comments (4)

Currently CatVariants -V only takes vcf/bcf files, I hope it can also take .list and vcf.gz files like other functions in GATK package.

Created 2013-11-11 17:07:40 | Updated | Tags: combinevariants catvariants
Comments (3)

If I want to merge different VCF files, which I used -L argument for calling variants against to different chromosomes individually with the same list of samples by HaplotypeCaller. I mean the sample are the same, I just used -L to call variants chromosome by chromosome separately. I suppose whether catVariants or CombineVariant will give me the same results, right ?

Created 2013-10-30 17:03:49 | Updated | Tags: commandlinegatk catvariants
Comments (1)

Below is the command:

java -cp $CLASSPATH/GenomeAnalysisTK.jar org.broadinstitute.sting.tools.CatVariants \
-R GATK_ref/hg19.fasta \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-1.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-2.vcf \
-V ../GATK/VQSR/parallel_batch/raw.snps_indels-3.vcf \
-out ../GATK/VQSR/parallel_batch/combined_raw.snps_indels.vcf \
-log ../GATK/VQSR/parallel_batch/log/combined.log \

After this, the combined_raw.snps_indels.vcf file only contains the header from raw.snps_indels-1.vcf, what might be wrong?