ApplyRecalibration

Applies cuts to the input vcf file (by adding filter lines) to achieve the desired novel truth sensitivity levels which were specified during VariantRecalibration

Category Variant Discovery Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

Using the tranche file generated by the previous step the ApplyRecalibration walker looks at each variant's VQSLOD value and decides which tranche it falls in. Variants in tranches that fall below the specified truth sensitivity filter level have their filter field annotated with its tranche level. This will result in a call set that simultaneously is filtered to the desired level but also has the information necessary to pull out more variants for a higher sensitivity but a slightly lower quality level.

Input

The input raw variants to be recalibrated.

The recalibration table file in VCF format that was generated by the VariantRecalibrator walker.

The tranches file that was generated by the VariantRecalibrator walker.

Output

A recalibrated VCF file in which each variant is annotated with its VQSLOD and filtered if the score is below the desired quality level.

Examples

 java -Xmx3g -jar GenomeAnalysisTK.jar \
   -T ApplyRecalibration \
   -R reference/human_g1k_v37.fasta \
   -input NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.b37.vcf \
   --ts_filter_level 99.0 \
   -tranchesFile path/to/output.tranches \
   -recalFile path/to/output.recal \
   -mode SNP \
   -o path/to/output.recalibrated.filtered.vcf
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by ApplyRecalibration.

Parallelism options

This tool can be run in multi-threaded mode using this option.

Downsampling settings

This tool applies the following downsampling settings by default.

  • Mode: BY_SAMPLE
  • To coverage: 1,000

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

ApplyRecalibration specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--input
NA The raw input variants to be recalibrated
--recal_file
 -recalFile
NA The input recal file used by ApplyRecalibration
Optional Inputs
--tranches_file
 -tranchesFile
NA The input tranches file describing where to cut the data
Optional Outputs
--out
 -o
stdout The output filtered and recalibrated VCF file in which each variant is annotated with its VQSLOD value
Optional Parameters
--ignore_filter
 -ignoreFilter
NA If specified, the recalibration will be applied to variants marked as filtered by the specified filter name in the input VCF file
--mode
SNP Recalibration mode to employ: 1.) SNP for recalibrating only SNPs (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both SNPs and indels simultaneously.
Optional Flags
--excludeFiltered
 -ef
false Don't output filtered loci after applying the recalibration
Advanced Parameters
--lodCutoff
NA The VQSLOD score below which to start filtering
--ts_filter_level
NA The truth sensitivity level at which to start filtering

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--excludeFiltered / -ef

Don't output filtered loci after applying the recalibration

boolean  false


--ignore_filter / -ignoreFilter

If specified, the recalibration will be applied to variants marked as filtered by the specified filter name in the input VCF file
For this to work properly, the -ignoreFilter argument should also be applied to the VariantRecalibration command.

String[]


--input / -input

The raw input variants to be recalibrated
These calls should be unfiltered and annotated with the error covariates that are intended to use for modeling.

--input binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

R List[RodBinding[VariantContext]]


--lodCutoff / -lodCutoff

The VQSLOD score below which to start filtering

Double


--mode / -mode

Recalibration mode to employ: 1.) SNP for recalibrating only SNPs (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both SNPs and indels simultaneously.

The --mode argument is an enumerated type (Mode), which can have one of the following values:

SNP
INDEL
BOTH

Mode  SNP


--out / -o

The output filtered and recalibrated VCF file in which each variant is annotated with its VQSLOD value

VariantContextWriter  stdout


--recal_file / -recalFile

The input recal file used by ApplyRecalibration

--recal_file binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

R RodBinding[VariantContext]


--tranches_file / -tranchesFile

The input tranches file describing where to cut the data

File


--ts_filter_level / -ts_filter_level

The truth sensitivity level at which to start filtering

Double


See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.2-2-gec30cee built at 2014/07/17 17:54:48. GTD: NA