No posts found Could not load requested forum posts.

BeagleOutputToVCF

Takes files produced by Beagle imputation engine and creates a vcf with modified annotations.

Category Variant Discovery Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

This walker is intended to be run after Beagle has successfully executed. The full calling sequence for using Beagle along with the GATK is:

1. Run ProduceBeagleInputWalker.

2. Run Beagle

3. Uncompress output files

4. Run BeagleOutputToVCFWalker.

Note that this walker requires all input files produced by Beagle.

Example

     java -Xmx4000m -jar dist/GenomeAnalysisTK.jar \
      -R reffile.fasta -T BeagleOutputToVCF \
      -V input_vcf.vcf \
      -beagleR2:BEAGLE /myrun.beagle_output.r2 \
      -beaglePhased:BEAGLE /myrun.beagle_output.phased \
      -beagleProbs:BEAGLE /myrun.beagle_output.gprobs \
      -o output_vcf.vcf
      

Note that Beagle produces some of these files compressed as .gz, so gunzip must be run on them before walker is run in order to decompress them


Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by BeagleOutputToVCF.

Downsampling settings

This tool applies the following downsampling settings by default.

  • Mode: BY_SAMPLE
  • To coverage: 1,000

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

BeagleOutputToVCF specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Inputs
--beaglePhased
NA Beagle-produced .phased file containing phased genotypes
--beagleProbs
NA Beagle-produced .probs file containing posterior genotype probabilities
--beagleR2
NA Beagle-produced .r2 file containing R^2 values for all markers
--variant
 -V
NA Input VCF file
Optional Inputs
--comp
none Comparison VCF file
Optional Outputs
--out
 -o
stdout VCF File to which variants should be written
Optional Parameters
--nocall_threshold
 -ncthr
0.0 Threshold of confidence at which a genotype won't be called
Optional Flags
--dont_mark_monomorphic_sites_as_filtered
 -keep_monomorphic
false If provided, we won't filter sites that beagle tags as monomorphic. Useful for imputing a sample's genotypes from a reference panel

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--beaglePhased / -beaglePhased

Beagle-produced .phased file containing phased genotypes
By default, all genotypes will be marked in the VCF as "phased", using the "|" separator after Beagle.

--beaglePhased binds reference ordered data. This argument supports ROD files of the following types: BEAGLE

R RodBinding[BeagleFeature]


--beagleProbs / -beagleProbs

Beagle-produced .probs file containing posterior genotype probabilities
These values will populate the GL field for each sample and contain the posterior probability of each genotype given the data after phasing and imputation.

--beagleProbs binds reference ordered data. This argument supports ROD files of the following types: BEAGLE

R RodBinding[BeagleFeature]


--beagleR2 / -beagleR2

Beagle-produced .r2 file containing R^2 values for all markers
This required argument is used to annotate each site in the vcf INFO field with R2 annotation. Will be NaN if Beagle determined there are no variant samples.

--beagleR2 binds reference ordered data. This argument supports ROD files of the following types: BEAGLE

R RodBinding[BeagleFeature]


--comp / -comp

Comparison VCF file
If this argument is present, the original allele frequencies and counts from this vcf are added as annotations ACH,AFH and ANH. at each record present in this vcf

--comp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

RodBinding[VariantContext]  none


--dont_mark_monomorphic_sites_as_filtered / -keep_monomorphic

If provided, we won't filter sites that beagle tags as monomorphic. Useful for imputing a sample's genotypes from a reference panel
If this argument is absent, and if Beagle determines that there is no sample in a site that has a variant genotype, the site will be marked as filtered (Default behavior). If the argument is present, the site won't be marked as filtered under this condition even if there are no variant genotypes.

boolean  false


--nocall_threshold / -ncthr

Threshold of confidence at which a genotype won't be called
Value between 0 and 1. If the probability of getting a genotype correctly (based on the posterior genotype probabilities and the actual genotype) is below this threshold, a genotype will be substitute by a no-call.

double  0.0  [ [ -?  ? ] ]


--out / -o

VCF File to which variants should be written

VariantContextWriter  stdout


--variant / -V

Input VCF file
Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

--variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

R RodBinding[VariantContext]


See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.1-1-g07a4bf8 built at 2014/03/18 07:00:36. GTD: NA