Tagged with #inbreedingcoeff
1 documentation article | 0 announcements | 4 forum discussions

Created 2012-07-23 16:45:48 | Updated 2014-12-08 17:56:15 | Tags: inbreedingcoeff phasebytransmission pedigree intermediate phasing methods plink allelefrequency

Comments (36)

There are two types of GATK tools that are able to use pedigree (family structure) information:

Tools that require a pedigree to operate

PhaseByTransmission and CalculateGenotypePosterior will not run without a properly formatted pedigree file. These tools are part of the Genotype Refinement workflow, which is documented here.

Tools that are able to generate standard variant annotations

The two variant callers (HaplotypeCaller and the deprecated UnifiedGenotyper) as well as VariantAnnotator and GenotypeGVCFs are all able to use pedigree information if you request an annotation that involves population structure (e.g. Inbreeding Coefficient). To be clear though, the pedigree information is not used during the variant calling process; it is only used during the annotation step at the end.

If you already have VCF files that were called without pedigree information, and you want to add pedigree-related annotations (e.g to use Variant Quality Score Recalibration (VQSR) with the InbreedingCoefficient as a feature annotation), don't panic. Just run the latest version of the VariantAnnotator to re-annotate your variants, requesting any missing annotations, and make sure you pass your PED file to the VariantAnnotator as well. If you forget to provide the pedigree file, the tool will run successfully but pedigree-related annotations may not be generated (this behavior is different in some older versions).

About the PED format

The PED files used as input for these tools are based on PLINK pedigree files. The general description can be found here.

For these tools, the PED files must contain only the first 6 columns from the PLINK format PED file, and no alleles, like a FAM file in PLINK.

No articles to display.

Created 2016-05-23 14:01:34 | Updated 2016-05-23 14:02:07 | Tags: inbreedingcoeff vcf

Comments (6)

Hi there,

Using GATK v3.5-0, I've generated vcf files for a group of 223 whole genomes, one per interval of the reference genome. I created GVCF files using Haplotype Caller in 'GVCF' mode, before using CombineGVCFs to generate cohorts, which were entered into GenotypeGVCFs runs. Curiously, not every SNP or indel called in the vcf file has an Inbreeding Coefficient value attributed to it. This happens with both SNPs and Indels. Below are two lines from one of the output vcf files - you can see that the top entry contains no Inbreeding Coefficient value, whilst the lower entry does.

NW_014444451.1 5724 . A C 29058.86 . AC=166;AF=0.382;AN=434;BaseQRankSum=0.322;ClippingRankSum=0.244;DP=1982;ExcessHet=0.0000;FS=0.000;MLEAC=172;MLEAF=0.396;MQ=59.39;MQRankSum=0.278;QD=31.55;ReadPosRankSum=0.00;SOR=0.711 GT:AD:DP:GQ:PGT:PID:PL 0/0:4,0:4:0:.:.:0,0,56 0/1:3,4:7:99:0|1:5714_G_C:159,0,829

NW_014444451.1 5731 . TG T 28376.05 . AC=167;AF=0.383;AN=436;BaseQRankSum=0.387;ClippingRankSum=0.00;DP=1963;ExcessHet=0.0000;FS=0.000;InbreedingCoeff=0.3346;MLEAC=174;MLEAF=0.399;MQ=59.60;MQRankSum=-5.300e-02;QD=31.49;ReadPosRankSum=0.00;SOR=0.743 GT:AD:DP:GQ:PGT:PID:PL 0/0:5,0:5:0:.:.:0,0,790/1:2,4:6:99:0|1:5714_G_C:200,0,694

I plan to use hard filters to generate a list of HQ variants that I can then feed into BQSR and VQSR in a second run-through of the GATK best practices, and I was going to include Inbreeding Coefficient in this. Would someone be able to explain to me why not all variant sites have Inbreeding Coefficient values and whether this will impact upon my filtering, please?

Many thanks,


Created 2016-02-28 08:13:03 | Updated | Tags: inbreedingcoeff

Comments (1)

the error when I use VariantRecalibrator, I counter problem like "ERROR MESSAGE: Bad input: Values for InbreedingCoeff annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. " I see the forum , but still don't know how to solve it. please help me

Created 2015-09-09 21:56:12 | Updated 2015-09-09 21:56:49 | Tags: inbreedingcoeff variantannotator mqranksum baseqranksum

Comments (7)

Hi @Team,

I found that VariantAnnotator sometimes does not annotate some annotations that are requested.

A ) The Rank Sum Test annotations MQRankSum & BaseQRankSum I was not able to identify the requirements that have to be met, so they are being calculated for a variant.

B ) InbreedingCoeff This one seems to be connected to the number of total called alleles (AN). For me there needed to be at least 10% alleles be called (19/186). The doc for that says [1] "at least 10 founder samples". Maybe this has to be updated to 10%?

These are the ones I observed. Can someone tell me more about that?

Thanks, Alexander

[1] https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php

Created 2012-10-10 16:18:42 | Updated 2013-01-07 20:29:25 | Tags: vqsr inbreedingcoeff annotation exome community

Comments (1)

I'm curious about the experience of the community at large with VQSR, and specifically with which sets of annotations people have found to work well. The GATK team's recommendations are valuable, but my impression is that they have fairly homogenous data types - I'd like to know if anyone has found it useful to deviate from their recommendations.

For instance, I no longer include InbreedingCoefficient with my exome runs. This was spurred by a case where previously validated variants were getting discarded by VQSR. It turned out that these particular variants were homozygous alternate in the diseased samples and homozygous reference in the controls, yielding an InbreedingCoefficient very close to 1. We decided that the all-homozygous case was far more likely to be genuinely interesting than a sequencing/variant calling artifact, so we removed the annotation from VQSR. In order to catch the all-heterozygous case (which is more likely to be an error), we add a VariantFiltration pass for 'InbreedingCoefficient < -0.8' following ApplyRecalibration.

In my case, I think InbreedingCoefficient isn't as useful because my UG/VQSR cohorts tend to be smaller and less diverse than what the GATK team typically runs (and to be honest, I'm still not sure we're doing the best thing). Has anyone else found it useful to modify these annotations? It would be helpful if we could build a more complete picture of these metrics in a diverse set of experiments.