No official posts found with tag ReadBackedPhasing

ReadBackedPhasing

Walks along all variant ROD loci, caching a user-defined window of VariantContext sites, and then finishes phasing them when they go out of range (using upstream and downstream reads).

Category Variant Discovery Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

Performs physical phasing of SNP calls, based on sequencing reads.

Input

VCF file of SNP calls, BAM file of sequence reads.

Output

Phased VCF file.

Examples

    java
      -jar GenomeAnalysisTK.jar
      -T ReadBackedPhasing
      -R reference.fasta
      -I reads.bam
      --variant SNPs.vcf
      -L SNPs.vcf
      -o phased_SNPs.vcf
      --phaseQualityThresh 20.0
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by ReadBackedPhasing.


Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

ReadBackedPhasing specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--variant RodBinding[VariantContext] NA Input VCF file
Optional
--cacheWindowSize Integer 20000 The window size (in bases) to cache variant sites and their reads for the phasing procedure
--debug boolean false If specified, print out very verbose debug information (if -l DEBUG is also specified)
-enableMergeToMNP boolean false Merge consecutive phased sites into MNP records
--maxGenomicDistanceForMNP int 1 The maximum reference-genome distance between consecutive heterozygous sites to permit merging phased VCF records into a MNP record
--maxPhaseSites Integer 10 The maximum number of successive heterozygous sites permitted to be used by the phasing algorithm
--min_base_quality_score int 17 Minimum base quality required to consider a base for phasing
--min_mapping_quality_score int 20 Minimum read mapping quality required to consider a read for phasing
--out VariantContextWriter stdout File to which variants should be written
--phaseQualityThresh Double 10.0 The minimum phasing quality score required to output phasing
--respectPhaseInInput boolean false Will only phase genotypes in cases where the resulting output will necessarily be consistent with any existing phase (for example, from trios)
--sampleToPhase Set[String] NA Only include these samples when phasing

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--cacheWindowSize / -cacheWindow ( Integer with default value 20000 )

The window size (in bases) to cache variant sites and their reads for the phasing procedure.

--debug / -debug ( boolean with default value false )

If specified, print out very verbose debug information (if -l DEBUG is also specified).

-enableMergeToMNP / --enableMergePhasedSegregatingPolymorphismsToMNP ( boolean with default value false )

Merge consecutive phased sites into MNP records.

--maxGenomicDistanceForMNP / -maxDistMNP ( int with default value 1 )

The maximum reference-genome distance between consecutive heterozygous sites to permit merging phased VCF records into a MNP record.

--maxPhaseSites / -maxSites ( Integer with default value 10 )

The maximum number of successive heterozygous sites permitted to be used by the phasing algorithm.

--min_base_quality_score / -mbq ( int with default value 17 )

Minimum base quality required to consider a base for phasing.

--min_mapping_quality_score / -mmq ( int with default value 20 )

Minimum read mapping quality required to consider a read for phasing.

--out / -o ( VariantContextWriter with default value stdout )

File to which variants should be written.

--phaseQualityThresh / -phaseThresh ( Double with default value 10.0 )

The minimum phasing quality score required to output phasing.

--respectPhaseInInput / -respectPhaseInInput ( boolean with default value false )

Will only phase genotypes in cases where the resulting output will necessarily be consistent with any existing phase (for example, from trios). Important note: do not use this argument if your input data set is not already phased or it will cause the tool to skip over all heterozygous sites.

--sampleToPhase / -sampleToPhase ( Set[String] )

Only include these samples when phasing.

--variant / -V ( required RodBinding[VariantContext] )

Input VCF file. Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file). --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.