PlinkRod

From GSA
Jump to: navigation, search

Warning: the material on this page is considered out of date by the GSA team.


Contents

Overview

PlinkRod is a Reference-Ordered Data object that allows for flexible use of genotype data encoded in PLINK files. These data can now be bound to the GATK just like a VCF or GFF, via the -B flag. To give a walker access to a plink file use:

-B plinkBindingName,Plink,/path/to/plink/file

This makes it possible to develop tools to analyze pedigree or population data without worrying about parsing plink files. It also allows for easy conversion to and from other formats, notably via PlinkToVCF.

Note that merely giving a walker access to a Plink ROD will not instruct the walker to use that data in any particular way.

Supported File Types

Currently, PlinkRod supports parsing two file types: "standalone .ped format" and "binary .ped format".

Standalone .ped Format

If your .ped file has a header and looks like this:

#Family ID      Individual ID      Sex      Example_snp_1|c1_p213898        Example_snp_2|c2_p1423878_gI      Example_snp_3|c19_p1878391_gD
FAM1      NA12312      F      A A      ATTC -      - -
FAM1      NA12315      M      A G      ATTC ATTC      GC GC

Then you can bind the file as is. Note the naming convention. PlinkRod is also flexible about the header; if you have Phenotype, Maternal, and Paternal information it will not adversely affect the PlinkRod.

Binary .ped Format

A binary .ped file is a .bed file. Binary .bed files are always packaged with a .bim file and a .fam file; both PLINK and the GATK require all three files to be in the same directory, and to have the same base name. That is: the files { my_file.bed , my_file.bim , my_file.fam } must all be in the same directory, and their names must only differ from each other by extension.

To bind the file "test_file.bed" to the GATK:

Before running the GATK, ensure that test_file.fam and test_file.bim are in the same directory as test_file.bed. Then the .bed file can be bound with

-B bindingName,Plink,/path/to/test_file.bed

Unsupported or Semi-Supported File Types

Alternate .ped Format

If your .ped file does not have a header, and looks like this:

Fam2 NA12415 0 0 1 -9 A A G G A G T T C T C C
Fam2 NA12417 0 0 2 -9 A C G G A A T T C C A C
Fam2 NA12419 0 0 1 -9 C C G T A A A T T T A A

then it is not in standalone format. Please convert it to binary format, and bind the resulting file. ( See Converting text plink files to binary )

Plink .raw Format

If your .ped file has a header, but looks like this:

FID IID PAT MAT SEX PHENOTYPE SNP1-12345_C SNP1-12345_HET SNP1-21274_A SNP1-21274_HET
FAM1 NA12121 NA12120 NA12123 F -9 0 0 1
FAM1 NA12120 0 0 M -9 0 1 0
FAM1 NA12123 0 0 F -9 0 0 2

Then you have a Plink .raw file. This filetype is currently unsupported by the GATK, and cannot be immediately converted to a .ped or .ped binary. Sorry!

Converting text plink files to binary

If you have a .ped and a .map file, and want to convert these to a (.bed, .bim, .fam) formatted fileset for use with the GATK, the command:

plink --file my_file --make-bed --out my_binary_file

will take my_file.ped and my_file.map and create from them ( my_binary_file.bed, my_binary_file.bim, and my_binary_file.fam ).

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox