Takes files produced by Beagle imputation engine and creates a vcf with modified annotations.
This walker is intended to be run after Beagle has successfully executed. The full calling sequence for using Beagle along with the GATK is:
1. Run ProduceBeagleInputWalker.
2. Run Beagle
3. Uncompress output files
4. Run BeagleOutputToVCFWalker.
Note that this walker requires all input files produced by Beagle.
java -Xmx4000m -jar dist/GenomeAnalysisTK.jar \
-R reffile.fasta -T BeagleOutputToVCF \
-V input_vcf.vcf \
-beagleR2:BEAGLE /myrun.beagle_output.r2 \
-beaglePhased:BEAGLE /myrun.beagle_output.phased \
-beagleProbs:BEAGLE /myrun.beagle_output.gprobs \
-o output_vcf.vcf
Note that Beagle produces some of these files compressed as .gz, so gunzip must be run on them before walker is run in order to decompress them
These Read Filters are automatically applied to the data by the Engine before processing by BeagleOutputToVCF.
The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).
This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.
| Name | Type | Default value | Summary |
|---|---|---|---|
| Required | |||
| --beaglePhased | RodBinding[BeagleFeature] | NA | Beagle-produced .phased file containing phased genotypes |
| --beagleProbs | RodBinding[BeagleFeature] | NA | Beagle-produced .probs file containing posterior genotype probabilities |
| --beagleR2 | RodBinding[BeagleFeature] | NA | Beagle-produced .r2 file containing R^2 values for all markers |
| --variant | RodBinding[VariantContext] | NA | Input VCF file |
| Optional | |||
| --comp | RodBinding[VariantContext] | none | Comparison VCF file |
| -keep_monomorphic | boolean | false | If provided, we won't filter sites that beagle tags as monomorphic. Useful for imputing a sample's genotypes from a reference panel |
| --nocall_threshold | double | 0.0 | Threshold of confidence at which a genotype won't be called |
| --out | VariantContextWriter | stdout | VCF File to which variants should be written |
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
Beagle-produced .phased file containing phased genotypes. By default, all genotypes will be marked in the VCF as "phased", using the "|" separator after Beagle. --beaglePhased binds reference ordered data. This argument supports ROD files of the following types: BEAGLE
Beagle-produced .probs file containing posterior genotype probabilities. These values will populate the GL field for each sample and contain the posterior probability of each genotype given the data after phasing and imputation. --beagleProbs binds reference ordered data. This argument supports ROD files of the following types: BEAGLE
Beagle-produced .r2 file containing R^2 values for all markers. This required argument is used to annotate each site in the vcf INFO field with R2 annotation. Will be NaN if Beagle determined there are no variant samples. --beagleR2 binds reference ordered data. This argument supports ROD files of the following types: BEAGLE
Comparison VCF file. If this argument is present, the original allele frequencies and counts from this vcf are added as annotations ACH,AFH and ANH. at each record present in this vcf --comp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3
If provided, we won't filter sites that beagle tags as monomorphic. Useful for imputing a sample's genotypes from a reference panel. If this argument is absent, and if Beagle determines that there is no sample in a site that has a variant genotype, the site will be marked as filtered (Default behavior). If the argument is present, the site won't be marked as filtered under this condition even if there are no variant genotypes.
Threshold of confidence at which a genotype won't be called. Value between 0 and 1. If the probability of getting a genotype correctly (based on the posterior genotype probabilities and the actual genotype) is below this threshold, a genotype will be substitute by a no-call.
VCF File to which variants should be written.
Input VCF file. Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file). --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3
See also Guide Index | Technical Documentation Index | Support Forum
GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.