No official posts found with tag VariantsToBinaryPed
No discussions found with tag VariantsToBinaryPed

VariantsToBinaryPed

Converts a VCF file to a binary plink Ped file (.bed/.bim/.fam)

Category Variant Evaluation and Manipulation Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview


Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by VariantsToBinaryPed.

Window size

This tool uses a sliding window on the reference.

  • Window start: 0 bp before the locus
  • Window stop: 100 bp after the locus

Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

VariantsToBinaryPed specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--bed PrintStream NA output ped file
--bim PrintStream NA output map file
--fam PrintStream NA output fam file
--metaData File NA Sample metadata file. You may specify a .fam file (in which case it will be copied to the file you provide as fam output).
--minGenotypeQuality int 0 If genotype quality is lower than this value, output NO_CALL
--variant RodBinding[VariantContext] NA Input VCF file
Optional
--checkAlternateAlleles boolean false Checks that alternate alleles actually appear in samples, erroring out if they do not
--dbsnp RodBinding[VariantContext] none dbSNP file
--majorAlleleFirst boolean false Sets the major allele to be 'reference' for the bim file, rather than the ref allele
--outputMode OutputMode INDIVIDUAL_MAJOR The output file mode (SNP major or individual major)

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--bed / -bed ( required PrintStream )

output ped file.

--bim / -bim ( required PrintStream )

output map file.

--checkAlternateAlleles ( boolean with default value false )

Checks that alternate alleles actually appear in samples, erroring out if they do not.

--dbsnp / -D ( RodBinding[VariantContext] with default value none )

dbSNP file. --dbsnp binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--fam / -fam ( required PrintStream )

output fam file.

--majorAlleleFirst ( boolean with default value false )

Sets the major allele to be 'reference' for the bim file, rather than the ref allele.

--metaData / -m ( required File )

Sample metadata file. You may specify a .fam file (in which case it will be copied to the file you provide as fam output).. The metaData file can take two formats, the first of which is the first 6 lines of the standard ped file. This is what Plink describes as a fam file. An example fam file is (note that there is no header):

CEUTrio NA12878 NA12891 NA12892 2 -9

CEUTrio NA12891 UNKN1 UNKN2 2 -9

CEUTrio NA12892 UNKN3 UNKN4 1 -9

where the entries are (FamilyID IndividualID DadID MomID Phenotype Sex)

An alternate format is a two-column key-value file

NA12878 fid=CEUTrio;dad=NA12891;mom=NA12892;sex=2;phenotype=-9

NA12891 fid=CEUTrio;sex=2;phenotype=-9

NA12892 fid=CEUTrio;sex=1;phenotype=-9

wherein unknown parents needn't be specified. The columns are the individual ID, and a list of key-value pairs.

Regardless of which file is specified, the walker will output a .fam file alongside the bed file. If the command line has "-md [name].fam", the fam file will simply be copied. However, if a metadata file of the alternate format is passed by "-md [name].txt", the walker will construct a formatted .fam file from the data.

--minGenotypeQuality / -mgq ( required int with default value 0 )

If genotype quality is lower than this value, output NO_CALL.

--outputMode / -mode ( OutputMode with default value INDIVIDUAL_MAJOR )

The output file mode (SNP major or individual major).
The --outputMode argument is an enumerated type (OutputMode), which can have one of the following values:

INDIVIDUAL_MAJOR
SNP_MAJOR

--variant / -V ( required RodBinding[VariantContext] )

Input VCF file. Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file). --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.