PlinkRod
Warning: the material on this page is considered out of date by the GSA team.
Contents |
Overview
PlinkRod is a Reference-Ordered Data object that allows for flexible use of genotype data encoded in PLINK files. These data can now be bound to the GATK just like a VCF or GFF, via the -B flag. To give a walker access to a plink file use:
-B plinkBindingName,Plink,/path/to/plink/file
This makes it possible to develop tools to analyze pedigree or population data without worrying about parsing plink files. It also allows for easy conversion to and from other formats, notably via PlinkToVCF.
Note that merely giving a walker access to a Plink ROD will not instruct the walker to use that data in any particular way.
Supported File Types
Currently, PlinkRod supports parsing two file types: "standalone .ped format" and "binary .ped format".
Standalone .ped Format
If your .ped file has a header and looks like this:
#Family ID Individual ID Sex Example_snp_1|c1_p213898 Example_snp_2|c2_p1423878_gI Example_snp_3|c19_p1878391_gD FAM1 NA12312 F A A ATTC - - - FAM1 NA12315 M A G ATTC ATTC GC GC
Then you can bind the file as is. Note the naming convention. PlinkRod is also flexible about the header; if you have Phenotype, Maternal, and Paternal information it will not adversely affect the PlinkRod.
Binary .ped Format
A binary .ped file is a .bed file. Binary .bed files are always packaged with a .bim file and a .fam file; both PLINK and the GATK require all three files to be in the same directory, and to have the same base name. That is: the files { my_file.bed , my_file.bim , my_file.fam } must all be in the same directory, and their names must only differ from each other by extension.
To bind the file "test_file.bed" to the GATK:
Before running the GATK, ensure that test_file.fam and test_file.bim are in the same directory as test_file.bed. Then the .bed file can be bound with
-B bindingName,Plink,/path/to/test_file.bed
Unsupported or Semi-Supported File Types
Alternate .ped Format
If your .ped file does not have a header, and looks like this:
Fam2 NA12415 0 0 1 -9 A A G G A G T T C T C C Fam2 NA12417 0 0 2 -9 A C G G A A T T C C A C Fam2 NA12419 0 0 1 -9 C C G T A A A T T T A A
then it is not in standalone format. Please convert it to binary format, and bind the resulting file. ( See Converting text plink files to binary )
Plink .raw Format
If your .ped file has a header, but looks like this:
FID IID PAT MAT SEX PHENOTYPE SNP1-12345_C SNP1-12345_HET SNP1-21274_A SNP1-21274_HET FAM1 NA12121 NA12120 NA12123 F -9 0 0 1 FAM1 NA12120 0 0 M -9 0 1 0 FAM1 NA12123 0 0 F -9 0 0 2
Then you have a Plink .raw file. This filetype is currently unsupported by the GATK, and cannot be immediately converted to a .ped or .ped binary. Sorry!
Converting text plink files to binary
If you have a .ped and a .map file, and want to convert these to a (.bed, .bim, .fam) formatted fileset for use with the GATK, the command:
plink --file my_file --make-bed --out my_binary_file
will take my_file.ped and my_file.map and create from them ( my_binary_file.bed, my_binary_file.bim, and my_binary_file.fam ).