Tagged with #pedigree
1 documentation article | 0 announcements | 8 forum discussions

Comments (27)


To call variants with the GATK using pedigree information, you should base your workflow on the Best Practices recommendations -- the principles detailed there all apply to pedigree analysis.

But there is one crucial addition: you should make sure to pass a pedigree file (PED file) to all GATK walkers that you use in your workflow. Some will deliver better results if they see the pedigree data.

At the moment there are two of the standard annotations affected by pedigree:

  • Allele Frequency (computed on founders only)
  • Inbreeding coefficient (computed on founders only)

Note that you will need at least 10 founders to compute the inbreeding coefficient.

Trio Analysis

In the specific case of trios, an additional GATK walker, PhaseByTransmission, should be used to obtain trio-aware genotypes as well as phase by descent.

Important note

The annotations mentioned above have been adapted for PED files starting with GATK v.1.6. If you already have VCF files generated by an older version of the GATK or have not passed a PED file while running the UnifiedGenotyper or VariantAnnotator, you should do the following:

  • Run the latest version of the VariantAnnotator to re-annotate your variants.
  • Re-annotate all the standard annotations by passing the argument -G StandardAnnotation to VariantAnnotator. Make sure you pass your PED file to the VariantAnnotator as well!
  • If you are using Variant Quality Score Recalibration (VQSR) with the InbreedingCoefficient as an annotation in your model, you should re-run VQSR once the InbreedingCoefficient is updated.

PED files

The PED files used as input for these tools are based on PLINK pedigree files. The general description can be found here.

For these tools, the PED files must contain only the first 6 columns from the PLINK format PED file, and no alleles, like a FAM file in PLINK.

No posts found with the requested search criteria.
Comments (2)

Hi all, I am trying to run PhaseByTransmission in a trio using the merged vcf file with father, mother and child. The vcf only contains PASS variants and no triallelic/multiallelic variants. I am also providing the ped file and it looks like that one is correct. However, during the run I encounter this error and really cannot figure out why is that? Is this because of being an indel variant? I have other indels in the file prior to this one though...

ERROR MESSAGE: Error parsing line: 1 181752783 rs798209 TG TTG 151.00 PASS 1KG_AF=.;AC1=1;AC=3;AF1=0.5;AN=6;CQ=intron_variant;DP4=102,16,89,13;DP=384;ENST=ENST00000357570;ESP_AF=.;FQ=127;GN=CACNA1E;HWE=1.000000;ICF=-1.00000;INDEL;MAF=.;MQ=48;PV4=1,1,0.11,0.14;SF=0,1,2;TYPE=ins;Cohort_AF=. GT:GQ:DP:SP:PL 0/1:99:82:0:178,0,162 0/1:99:65:3:195,0,133 0/1:99:73:3:.,.,158,

Any idea? Thanks a lot!

Comments (1)


Are there any GATK tools that can be used to check that samples in a pedigree are not mislabelled by looking for Mendelian inconsistencies in the exome sequencing data?



Comments (3)

Hi all, Just to give some context: I have filtered my trio data with some scripting to only heterozygous (hets) variants that may constitute compound hets (i.e., if phase could be accurately inferred). This is essentially phasing the child data by transmission - for all the het variants seen in the child I looked at the father and mother vcfs and filtered relevant sites as follows: - each het variant in child has to be in only and exactly one of the parents, so this excludes 1) hets present in both parents (these cannot be resolved) and 2) hets not present in any parent (not interested on those as I only want to analyse compound hets); - selected genes with at least two of the above vars; - selected genes with at least one het transmitted from the paternal side and one het from the maternal side.

My question is: can I use this filtered child vcf as my input for ReadBackedPhasing? For each of my genes that feature in the child vcf after the above filtering, I want to determine whether the variants seen within the gene are in the same haplotype or not. I am just not sure if I can do the phasing at this stage - is this alright? If I had to do the phasing early on with the raw vcf, I am not sure how would I maintain the correct phasing information when applying this filtering downstream to the phased vcf (i.e., as the phasing of a het variant is relevant to the previous PASS-ing het variant in the vcf?).

Help would be appreciated! Thanks a lot, Eva

Comments (18)


I am trying to use PED files when I run UnifiedGeotyper, variantannotator and VariantRecalibrator (The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14)

if I don't add the PED file , I have some InbreedingCoeff

annotation=[SnpEff, AlleleBalance, BaseCounts, GCContent, HardyWeinberg, IndelType, AlleleBalanceBySample, MappingQualityZeroBySample]

1       14653   .       C       T       22.54   LowQual ABHet=0.842;ABHom=0.803;AC=4;AF=0.182;AN=22;BaseCounts=1,1362,0,261;BaseQRankSum=-4.292;DP=1075;Dels=0.00;FS=4.692;GC=58.42;HW=2.5;HaplotypeScore=0.0000;InbreedingCoeff=0.0530;MLEAC=3;MLEAF=0.136;MQ=6.41;MQ0=967;MQRankSum=-2.246;OND=0.164;QD=0.08;ReadPosRankSum=-0.008;SNPEFF_EFFECT=DOWNSTREAM;SNPEFF_FUNCTIONAL_CLASS=NONE;SNPEFF_GENE_BIOTYPE=processed_transcript;SNPEFF_GENE_NAME=DDX11L1;SNPEFF_IMPACT=MODIFIER;SNPEFF_TRANSCRIPT_ID=ENST00000456328;set=FilteredInAll      GT:AB:AD:DP:GQ:MQ0:PL

but when I add the PED file, I have nothing

annotation=[AlleleBalance, BaseCounts, GCContent, HardyWeinberg, IndelType, AlleleBalanceBySample, MappingQualityZeroBySample, InbreedingCoeff] pedigree=[/scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped] pedigreeString=[] pedigreeValidationType=SILENT

1       14653   .       C       T       10.43   LowQual ABHom=0.792;AC=2;AF=0.091;AN=22;BaseCounts=1,1362,0,261;BaseQRankSum=-3.828;DP=1075;Dels=0.00;FS=2.373;GC=58.42;HW=10.2;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.045;MQ=6.20;MQ0=976;MQRankSum=-2.768;OND=0.190;QD=0.10;

and my PED file seems is bad format because I have this message

INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,770 PedReader - Phenotype is other? false 
INFO  17:02:17,770 PedReader - Phenotype is other? false 
INFO  17:02:17,770 PedReader - Phenotype is other? false 
INFO  17:02:17,775 PedReader - Phenotype is other? false 

But I have 6 columns and the name and tabulation between each column

2 MN932002 0 0 2 1
2 PNB32015 0 MN932002 1 1
4 CS934001 0 0 2 2
4 RMAO8004 0 CS934001 1 2
6 KH948045 0 0 2 2
6 AHAO8001 0 0 2 2
6 MHAO8003 0 0 2 2
A LSB32012 0 0 2 2
A SBB32010 0 LSB32012 2 2
B CMSB32011 0 0 2 2
C MKB32014 0 0 2 2
Z M88BBTPL 0 0 other -9

for the phenotype :

-9 missing 1 unaffected 2 affected

Consequently, I cannot run " VariantRecalibrator with -an InbreedingCoeff". Could you help me ?



Comments (3)

Hi to all

I have just started using GATK and I have few question about some tools and about the general workflow.

I have 3 exome-seq data from a trio and I have to detect rare or private variants that segregate with the disease.

From the 3 aligned bam file I procedeed with the GATK pipeline (ADDgroupInfo, MarkDup, Realign, BQSR, Unified Genotyper and variant filtration) and I generated 3 VCF file.

As now I have to use the PhaseByTrasmission tool, should I merge the 3 VCF file?

Or it was better to merge the BAM file after adding the group info and proceed with the other analysis?

And should I create my .ped file,(I visited http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped, but I couln't understand how ped file is generated) based on the read group that I have assigned?


Comments (11)


while using the walker PhaseByTransmission I always get this error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.1-12-ga99c19d): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: File associated with name java.io.FileReader@5cf7c5b5 is malformed: Bad PED line 1: wrong number of fields
##### ERROR ------------------------------------------------------------------------------------------

my conmmand is :

java -jar GenomeAnalysisTK-2.1-12-ga99c19d/GenomeAnalysisTK.jar -T  PhaseByTransmission -R GRCh37.fasta -V trios_457.chr22.vcf -ped trios_457.chr22.ped -pedValidationType SILENT -o o1.vcf

and my ped file is like this:

fam1    s_4     0       0       1       1       C       C       C       C       G       G
fam1    s_5     0       0       2       2       T       T       T       T       G       G
fam1    s_7     s_4     s_5     2       2       C       T       C       T       G       G

I do counted my vcf ped and map files and the result is:

-bash-4.1$ head -1 trios_457.chr22.ped |wc -w
1892         #( 6 columns for info + 943*2 columns for alleles )
-bash-4.1$ wc -l trios_457.chr22.map 
-bash-4.1$ grep -v "#" trios_457.chr22.vcf | wc -l

My question is what's wrong with my my PED line?

Comments (5)

Before there is webpage for how to convert plink ped format to vcf format. But it seems that this link disappeared.


Thank you very much in advance.

Comments (4)

Is it possible to use PhaseByTransmission with families that are larger than a single trio? I have a family with four siblings. If I include all of the siblings in the PED I get:

PhaseByTransmission - Caution: Family BMD has 6 members; At the moment Phase By Transmission only supports trios and parent/child pairs. Family skipped.
ERROR MESSAGE: Bad input: No PED file passed or no trios found in PED file. Aborted.

And if I just include the one key trio with the proband, I get the following:

ERROR MESSAGE: Sample BMD006_R found in data sources but not in pedigree files with STRICT pedigree validation

There does not seem to be an accessible argument for relaxing the pedigree validation. Is there a way to use PhaseByTransmission with my larger family?