As reported here, a bug was found in VariantsToBinaryPED. Briefly, VariantsToBinaryPed expected the fam file to describe the samples in the same order as the input VCF file: if they were not in the same order, it did not correctly map sample IDs with the genotypes in the output binary PED.
We expect that in most use cases, the order would be the same (because PLINK uses lexicographic order, as does the GATK), so the bug would not impact results. However, as the user report demonstrates, in cases where order was different, the bug would seriously impact results.
We therefore recommend that anyone who has used VariantsToBinaryPED check their results for any inconsistencies in the kinship coefficients. Our apologies for the inconvenience to anyone who is affected by this bug, and big thanks again to user TimHughes for reporting the bug.
Finally, we have fixed the bug in GATK and released the fixed version under version number 2.7-4.
Hi GATK team, I'd like to use the VariantstoBinaryPed tool on my re-sequencing dataset of ~600 individuals in order to check concordance with some GWAS data, etc. Will this be possible using the new best practices with HC, or should I stick with a multi-sample VCF from UG? Thanks!