# VariantAnnotator

Annotates variant calls with context information.

## Overview

VariantAnnotator is a GATK tool for annotating variant calls based on their context. The tool is modular; new annotations can be written easily without modifying VariantAnnotator itself.

### Input

A variant set to annotate and optionally one or more BAM files.

### Output

An annotated VCF.

### Examples

 java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T VariantAnnotator \
-I input.bam \
-o output.vcf \
-A Coverage \
--variant input.vcf \
-L input.vcf \
--dbsnp dbsnp.vcf


These Read Filters are automatically applied to the data by the Engine before processing by VariantAnnotator.

### Parallelism options

This tool can be run in multi-threaded mode using this option.

### Downsampling settings

This tool applies the following downsampling settings by default.

• Mode: BY_SAMPLE
• To coverage: 250

### Window size

This tool uses a sliding window on the reference.

• Window start: -50 bp before the locus
• Window stop: 50 bp after the locus

## Command-line Arguments

### Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

### VariantAnnotator specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--variant
-V
NA Input VCF file
Optional Inputs
--comp
NA comparison VCF file
--dbsnp
-D
NA dbSNP file
--resource
NA External resource VCF file
--snpEffFile
NA A SnpEff output file from which to add annotations
Optional Outputs
--out
-o
NA File to which variants should be written
Optional Parameters
--annotation
-A
NA One or more specific annotations to apply to variant calls
--excludeAnnotation
-XA
NA One or more specific annotations to exclude
--expression
-E
NA One or more specific expressions to apply to variant calls
--group
-G
NA One or more classes/groups of annotations to apply to variant calls
--MendelViolationGenotypeQualityThreshold
-mvq
0.0 The genotype quality threshold in order to annotate mendelian violation ratio
Optional Flags
--alwaysAppendDbsnpId
NA Append the dbSNP ID even when the variant VCF already has the ID field populated
--list
-ls
NA List the available annotations and exit
--useAllAnnotations
-all
NA Use all possible annotations (not for the faint of heart)

### Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

### --alwaysAppendDbsnpId / -alwaysAppendDbsnpId

Append the dbSNP ID even when the variant VCF already has the ID field populated
By default, the dbSNP ID is added only when the ID field in the variant VCF is empty (not already annotated). This argument allows you to override that behavior. This is used in conjuction with the -dbsnp argument.

Boolean

### --annotation / -A

One or more specific annotations to apply to variant calls
See the --list argument to view available annotations.

List[String]

### --comp / -comp

comparison VCF file
If a record in the 'variant' track overlaps with a record from the provided comp track, the INFO field will be annotated as such in the output with the track name (e.g. -comp:FOO will have 'FOO' in the INFO field). Records that are filtered in the comp track will be ignored. Note that 'dbSNP' has been special-cased (see the --dbsnp argument).

--comp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

List[RodBinding[VariantContext]]

### --dbsnp / -D

dbSNP file
rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate.

--dbsnp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

RodBinding[VariantContext]

### --excludeAnnotation / -XA

One or more specific annotations to exclude
Note that this argument has higher priority than the -A or -G arguments, so annotations will be excluded even if they are explicitly included with the other options.

List[String]

### --expression / -E

One or more specific expressions to apply to variant calls
This option enables you to add annotations from one VCF to another. For example, if you want to annotate your callset with the AC field value from a VCF file named 'resource_file.vcf', you tag it with '-resource:my_resource resource_file.vcf' (see the -resource argument, also documented on this page) and you specify '-E my_resource.AC'. In the resulting output VCF, any records for which there is a record at the same position in the resource file will be annotated with 'my_resource.AC=N'. Note that if there are multiple records in the resource file that overlap the given position, one is chosen randomly.

Set[String]

### --group / -G

One or more classes/groups of annotations to apply to variant calls
If specified, all available annotations in the group will be applied. See the VariantAnnotator -list argument to view available groups. Keep in mind that RODRequiringAnnotations are not intended to be used as a group, because they require specific ROD inputs.

List[String]

### --list / -ls

List the available annotations and exit
Note that the --list argument requires a fully resolved and correct command-line to work. As an alternative, you can use ListAnnotations (see Help Utilities).

Boolean

### --MendelViolationGenotypeQualityThreshold / -mvq

The genotype quality threshold in order to annotate mendelian violation ratio

double  [ [ -∞  ∞ ] ]

### --out / -o

File to which variants should be written

VariantContextWriter

### --resource / -resource

External resource VCF file
An external resource VCF file or files from which to annotate. Use this option to add annotations from a resource file to the output. For example, if you want to annotate your callset with the AC field value from a VCF file named 'resource_file.vcf', you tag it with '-resource:my_resource resource_file.vcf' and you additionally specify '-E my_resource.AC' (-E is short for --expression, also documented on this page). In the resulting output VCF, any records for which there is a record at the same position in the resource file will be annotated with 'my_resource.AC=N'. Note that if there are multiple records in the resource file that overlap the given position, one is chosen randomly.

--resource binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

List[RodBinding[VariantContext]]

### --snpEffFile / -snpEffFile

A SnpEff output file from which to add annotations
The INFO field will be annotated with information on the most biologically significant effect listed in the SnpEff output file for each variant.

--snpEffFile binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

RodBinding[VariantContext]

### --useAllAnnotations / -all

Use all possible annotations (not for the faint of heart)
You can use the -XL argument in combination with this one to exclude specific annotations.Note that some annotations may not be actually applied if they are not applicable to the data provided or if they are unavailable to the tool (e.g. there are several annotations that are currently not hooked up to HaplotypeCaller). At present no error or warning message will be provided, the annotation will simply be skipped silently. You can check the output VCF header to see which annotations were actually applied (although this does not guarantee that the annotation was applied to all records in the VCF, since some annotations have additional requirements, e.g. minimum number of samples or heterozygous sites only -- see the documentation for individual annotations' requirements).

Boolean

### --variant / -V

Input VCF file
Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

--variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

R RodBinding[VariantContext]