Tagged with #variantstobinaryped
1 documentation article | 1 announcement | 4 forum discussions


Comments (0)

As reported here, a bug was found in VariantsToBinaryPED. Briefly, VariantsToBinaryPed expected the fam file to describe the samples in the same order as the input VCF file: if they were not in the same order, it did not correctly map sample IDs with the genotypes in the output binary PED.

We expect that in most use cases, the order would be the same (because PLINK uses lexicographic order, as does the GATK), so the bug would not impact results. However, as the user report demonstrates, in cases where order was different, the bug would seriously impact results.

We therefore recommend that anyone who has used VariantsToBinaryPED check their results for any inconsistencies in the kinship coefficients. Our apologies for the inconvenience to anyone who is affected by this bug, and big thanks again to user TimHughes for reporting the bug.

Finally, we have fixed the bug in GATK and released the fixed version under version number 2.7-4.

Comments (3)

I have come across some strange results with using VariantsToBinaryPED. When I look at the .fam file VariantsToBinaryPED produces, the parent's IDs were swapped. To illustrate:

my_original.fam FAM1 1_mother 0 0 2 1 FAM1 2_father 0 0 1 1 FAM1 3_child 2_father 1_mother 1 2

VariantsToBinaryPED.fam FAM1 1_mother 0 0 2 1 FAM1 2_father 0 0 1 1 FAM1 3_child 1_mother 2_father 1 2

It would seem like the program looks at the order in the original .fam file, and assumes that the father ID is sorted before the mother ID? As far as I have understood, this is not a prerequisite of the .fam format. I'm on GATK version 3.3

Comments (3)

Hey GATK Team,

Ive encountered a GATK runtime error, which says might be the result of a bug, but tracked it down to a file suffix issue. I tried GATKv3.2-2 and GATKv2.7-2 and the "problem" seems common to both... When my input metaData file is suffixed with .meta I get the following issue, but when it ends in .fam it runs successfully. My guess is that it's not checking that the input file ends in .fam?

INFO 11:55:47,955 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:55:47,957 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-2-g6bda569, Compiled 2013/08/28 16:30:29 INFO 11:55:47,957 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 11:55:47,957 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 11:55:47,960 HelpFormatter - Program Args: -T VariantsToBinaryPed -R v5_0.chr+cpDNA.fa -V v5.0.combined.biSNP.vcf --bed v5.0.combined.biSNP.bed --bim v5.0.combined.biSNP.bim --fam v5.0.combined.biSNP.fam --minGenotypeQuality 30 --metaData ./v5.0.combined.biSNP.meta INFO 11:55:47,961 HelpFormatter - Date/Time: 2014/09/27 11:55:47 INFO 11:55:47,961 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:55:47,961 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:55:47,966 ArgumentTypeDescriptor - Dynamically determined type of v5.0.combined.biSNP.vcf to be VCF INFO 11:55:48,520 GenomeAnalysisEngine - Strictness is SILENT INFO 11:55:49,019 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 11:55:49,038 RMDTrackBuilder - Loading Tribble index from disk for file v5.0.combined.biSNP.vcf INFO 11:55:49,686 GenomeAnalysisEngine - Preparing for traversal INFO 11:55:49,716 GenomeAnalysisEngine - Done preparing for traversal INFO 11:55:49,716 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 11:55:49,716 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 11:55:50,525 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 1 at org.broadinstitute.sting.gatk.walkers.variantutils.VariantsToBinaryPed.parseMetaData(VariantsToBinaryPed.java:483) at org.broadinstitute.sting.gatk.walkers.variantutils.VariantsToBinaryPed.initialize(VariantsToBinaryPed.java:141) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.7-2-g6bda569):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: 1
ERROR ------------------------------------------------------------------------------------------

INFO 11:56:03,420 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:56:03,422 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.2-2-gec30cee, Compiled 2014/07/17 15:22:03 INFO 11:56:03,422 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 11:56:03,422 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 11:56:03,425 HelpFormatter - Program Args: -T VariantsToBinaryPed -R v5_0.chr+cpDNA.fa -V v5.0.combined.biSNP.vcf --bed v5.0.combined.biSNP.bed --bim v5.0.combined.biSNP.bim --fam v5.0.combined.biSNP.fam --minGenotypeQuality 30 --metaData ./v5.0.combined.biSNP.meta INFO 11:56:03,429 HelpFormatter - Executing as XXXXXXXX on Linux 2.6.32-431.20.3.el6.nersc.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13. INFO 11:56:03,429 HelpFormatter - Date/Time: 2014/09/27 11:56:03 INFO 11:56:03,429 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:56:03,430 HelpFormatter - -------------------------------------------------------------------------------- INFO 11:56:03,917 GenomeAnalysisEngine - Strictness is SILENT INFO 11:56:04,427 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 11:56:05,106 GenomeAnalysisEngine - Preparing for traversal INFO 11:56:05,156 GenomeAnalysisEngine - Done preparing for traversal INFO 11:56:05,157 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 11:56:05,157 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 11:56:05,158 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 11:56:05,969 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 1 at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.parseMetaData(VariantsToBinaryPed.java:489) at org.broadinstitute.gatk.tools.walkers.variantutils.VariantsToBinaryPed.initialize(VariantsToBinaryPed.java:141) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: 1
ERROR ------------------------------------------------------------------------------------------
Comments (4)

Hi, I'm looking for the documentation for the VariantstoPed tool, but the only link I can find (https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_VariantsToPED.html) gives me a 404 error, and I get the same for the VariantstoBinaryPed. I thought it might have been because of yesterday's maintenance but I'm getting the same error today.

Is this tool still supported by the latest GatK? If not, is the documentation available anywhere so I try with an older version of the GatK?

Thanks heaps!

Comments (1)

Hi GATK team, I'd like to use the VariantstoBinaryPed tool on my re-sequencing dataset of ~600 individuals in order to check concordance with some GWAS data, etc. Will this be possible using the new best practices with HC, or should I stick with a multi-sample VCF from UG? Thanks!