VariantAnnotator

Annotates variant calls with context information.

Category Variant Evaluation and Manipulation Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

VariantAnnotator is a GATK tool for annotating variant calls based on their context. The tool is modular; new annotations can be written easily without modifying VariantAnnotator itself.

Input

A variant set to annotate and optionally one or more BAM files.

Output

An annotated VCF.

Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T VariantAnnotator \
   -I input.bam \
   -o output.vcf \
   -A Coverage \
   --variant input.vcf \
   -L input.vcf \
   --dbsnp dbsnp.vcf
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by VariantAnnotator.

Parallelism options

This tool can be run in multi-threaded mode using this option.

Window size

This tool uses a sliding window on the reference.

  • Window start: -50 bp before the locus
  • Window stop: 50 bp after the locus

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

VariantAnnotator specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--variant RodBinding[VariantContext] NA Input VCF file
Optional
--alwaysAppendDbsnpId Boolean false In conjunction with the dbSNP binding, append the dbSNP ID even when the variant VCF already has the ID field populated
--annotation List[String] [] One or more specific annotations to apply to variant calls
--comp List[RodBinding[VariantContext]] [] comparison VCF file
--dbsnp RodBinding[VariantContext] none dbSNP file
--excludeAnnotation List[String] [] One or more specific annotations to exclude
--expression Set[String] {} One or more specific expressions to apply to variant calls; see documentation for more details
--group List[String] [] One or more classes/groups of annotations to apply to variant calls
--list Boolean false List the available annotations and exit
-mvq double 0.0 The genotype quality threshold in order to annotate mendelian violation ratio
--out VariantContextWriter stdout File to which variants should be written
--requireStrictAlleleMatch boolean false If provided only comp tracks that exactly match both reference and alternate alleles will be counted as concordant
--resource List[RodBinding[VariantContext]] [] external resource VCF file
--snpEffFile RodBinding[VariantContext] none A SnpEff output file from which to add annotations
--useAllAnnotations Boolean false Use all possible annotations (not for the faint of heart)

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--alwaysAppendDbsnpId / -alwaysAppendDbsnpId ( Boolean with default value false )

In conjunction with the dbSNP binding, append the dbSNP ID even when the variant VCF already has the ID field populated. By default, the dbSNP ID is added only when the ID field in the variant VCF is empty.

--annotation / -A ( List[String] with default value [] )

One or more specific annotations to apply to variant calls. See the -list argument to view available annotations.

--comp / -comp ( List[RodBinding[VariantContext]] with default value [] )

comparison VCF file. If a record in the 'variant' track overlaps with a record from the provided comp track, the INFO field will be annotated as such in the output with the track name (e.g. -comp:FOO will have 'FOO' in the INFO field). Records that are filtered in the comp track will be ignored. Note that 'dbSNP' has been special-cased (see the --dbsnp argument). --comp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--dbsnp / -D ( RodBinding[VariantContext] with default value none )

dbSNP file. rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. --dbsnp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--excludeAnnotation / -XA ( List[String] with default value [] )

One or more specific annotations to exclude. Note that this argument has higher priority than the -A or -G arguments, so annotations will be excluded even if they are explicitly included with the other options.

--expression / -E ( Set[String] with default value {} )

One or more specific expressions to apply to variant calls; see documentation for more details. This option enables you to add annotations from one VCF to another. For example, if you want to annotate your 'variant' VCF with the AC field value from the rod bound to 'resource', you can specify '-E resource.AC' and records in the output VCF will be annotated with 'resource.AC=N' when a record exists in that rod at the given position. If multiple records in the rod overlap the given position, one is chosen arbitrarily.

--group / -G ( List[String] with default value [] )

One or more classes/groups of annotations to apply to variant calls. If specified, all available annotations in the group will be applied. See the VariantAnnotator -list argument to view available groups. Keep in mind that RODRequiringAnnotations are not intended to be used as a group, because they require specific ROD inputs.

--list / -ls ( Boolean with default value false )

List the available annotations and exit. Note that the --list argument requires a fully resolved and correct command-line to work. As a simpler alternative, you can use ListAnnotations (see Help Utilities).

-mvq / --MendelViolationGenotypeQualityThreshold ( double with default value 0.0 )

The genotype quality threshold in order to annotate mendelian violation ratio.

--out / -o ( VariantContextWriter with default value stdout )

File to which variants should be written.

--requireStrictAlleleMatch / -strict ( boolean with default value false )

If provided only comp tracks that exactly match both reference and alternate alleles will be counted as concordant.

--resource / -resource ( List[RodBinding[VariantContext]] with default value [] )

external resource VCF file. An external resource VCF file or files from which to annotate. One can add annotations from one of the resource VCFs to the output. For example, if you want to annotate your 'variant' VCF with the AC field value from the rod bound to 'resource', you can specify '-E resource.AC' and records in the output VCF will be annotated with 'resource.AC=N' when a record exists in that rod at the given position. If multiple records in the rod overlap the given position, one is chosen arbitrarily. --resource binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--snpEffFile / -snpEffFile ( RodBinding[VariantContext] with default value none )

A SnpEff output file from which to add annotations. The INFO field will be annotated with information on the most biologically-significant effect listed in the SnpEff output file for each variant. --snpEffFile binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--useAllAnnotations / -all ( Boolean with default value false )

Use all possible annotations (not for the faint of heart). Note that the -XL argument can be used along with this one to exclude annotations.

--variant / -V ( required RodBinding[VariantContext] )

Input VCF file. Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file). --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.