Tagged with #trio
0 documentation articles | 0 announcements | 7 forum discussions

No articles to display.

No articles to display.

Created 2016-01-29 08:37:49 | Updated 2016-01-29 08:38:15 | Tags: phasebytransmission trio phasing read-backed-phasing

Comments (4)

I would expect the trio phasing to try and match whole reads from the parents into the child to infer whether a variant came from mum or dad (so to work on the BAM file or at least the output of HaplotypeCaller). But it seem that the input is a VCF file?

Created 2015-12-22 14:29:56 | Updated | Tags: unifiedgenotyper haplotypecaller trio pedigree

Comments (10)


I've been exploring de novo mutation identification in the context of a pedigree of trios. I've run the UnfiedGenotyper (UG) given all the bam files for ~25 sets of trios and it appears to identify a set of de novo mutations. When I run the HaplotypeCaller (HC) pipeline, first generating gVCF files for each individual, and then using the merged gVCF files along with the pedigree for genotype refinement and de novo mutation calling, it also finds a number of de novo mutations annotated as hi-confidence de novo mutations. When I compare the UG de novo mutations to the high confidence HC list, there's very little overlap. Many of the UG hi-confidence de novo variants are called by HC, but listed as low-confidence de novo variants, and from looking at a few examples, it would appear that the HC calls have assigned lower genotype confidence levels for the parental (non-mutated, reference) genotypes. Could it be that because the gVCF files aren't storing position-specific information for the reference (non-mutated) positions in the genome, the pedigree-type de novo mutation calling is not as accurate as it could be? Should I be generating gVCFs that include position-specific information?

Many thanks for any insights. If it would help, I could post some examples.

Created 2015-08-19 00:13:46 | Updated | Tags: trio denovo

Comments (9)


I have followed the GATK guidelines for trio calling, recalculating posteriors, and annotating possible de novo. However, I am confused about some of the 'hiConfDeNovo' calls.

Here is an example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  child   father  mother
chr1    144674391   rs61810930  C   T   1177.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=2.82;ClippingRankSum=0.306;DB;DP=71;FS=2.137;MLEAC=3;MLEAF=0.500;MQ=64.80;MQRankSum=-1.675e+00;PG=0,0,0;QD=16.59;ReadPosRankSum=-1.813e+00;SOR=1.034;hiConfDeNovo=child GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:16,17:33:99:127:127:0|1:144674371_T_C:645,0,612:645,0,612   0/1:11,5:16:99:127:127:0|1:144674371_T_C:151,0,447:151,0,447    0/1:11,11:22:99:127:127:0|1:144674371_T_C:412,0,692:412,0,692
chr1    145037984   rs2489136   T   G   1401.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=-4.595e+00;ClippingRankSum=0.717;DB;DP=107;FS=43.850;MLEAC=3;MLEAF=0.500;MQ=69.55;MQRankSum=-9.500e-02;PG=0,0,0;QD=13.35;ReadPosRankSum=0.032;SOR=0.296;hiConfDeNovo=child  GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:20,21:41:99:127:127:.:.:510,0,640:510,0,640 0/1:10,18:28:99:127:127:0|1:145037818_T_C:454,0,293:454,0,293   0/1:17,19:36:99:127:127:.:.:468,0,504:468,0,504
chr1    145053717   rs6670785   T   C   74.13   PASS    AC=1;AF=0.167;AN=6;BaseQRankSum=1.91;ClippingRankSum=-1.220e-01;DB;DP=114;FS=5.902;MLEAC=1;MLEAF=0.167;MQ=68.67;MQRankSum=0.203;PG=0,0,0;QD=3.22;ReadPosRankSum=1.01;SOR=2.712;hiConfDeNovo=child   GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/0:48,0:48:99:96:96:.:.:0,102,1530:0,102,1590  0/1:19,4:23:99:96:96:0|1:145053711_C_T:105,0,826:105,0,886  0/0:42,0:42:99:96:96:.:.:0,99,1485:0,99,1545
chr1    145121901   rs10752808  A   G   3364.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=4.39;ClippingRankSum=0.353;DB;DP=173;FS=3.597;MLEAC=3;MLEAF=0.500;MQ=63.47;MQRankSum=-3.750e-01;PG=0,0,0;QD=19.45;ReadPosRankSum=0.164;SOR=0.550;hiConfDeNovo=child GT:AD:DP:GQ:JL:JP:PL:PP 0/1:20,37:57:99:127:127:1187,0,627:1187,0,627   0/1:21,24:45:99:127:127:717,0,736:717,0,736 0/1:24,47:71:99:127:127:1491,0,706:1491,0,706
chr2    91766984    rs55874359  A   T   6057.92 PASS    AC=3;AF=0.500;AN=6;BaseQRankSum=0.467;ClippingRankSum=-5.510e-01;DB;DP=404;FS=23.592;MLEAC=3;MLEAF=0.500;MQ=67.76;MQRankSum=-5.800e-02;PG=0,0,0;QD=15.78;ReadPosRankSum=0.318;SOR=0.878;hiConfDeNovo=child  GT:AD:DP:GQ:JL:JP:PGT:PID:PL:PP 0/1:69,62:131:99:127:127:.:.:1543,0,2330:1543,0,2330    0/1:44,40:84:99:127:127:0|1:91766961_T_C:1493,0,1698:1493,0,1698    0/1:87,82:169:99:127:127:0|1:91766961_T_C:3052,0,3450:3052,0,3450

If the all samples are called as 0/1 why is it called a deNovo event? Perhaps I am misunderstanding the definition used? This was done using gatk v3.4.

Created 2013-09-30 00:26:16 | Updated 2013-09-30 00:57:39 | Tags: phasebytransmission trio

Comments (3)

(EDIT: solution found and explained below, mostly an error on my end, sorry)

I have what I know is a de novo variant (validated) and GATK PhaseByTransmission refuses to see it. Here is what I am starting with in my VCF file: 7 151092903 . G A 338.83 PASS . GT:AD:DP:GQ:PL 0/0:12,0:12:36:0,36,414 0/0:20,0:20:60:0,60,669 0/1:6,15:20:99:389,0,108


  • the father is 12 ref, 0 alt
  • the mother is 20 ref, 0 alt
  • the offspring is 6 ref, 15 alt

When I run java -Xmx2g -jar GenomeAnalysisTK-2.7-2-g6bda569/GenomeAnalysisTK.jar -R fasta/human_g1k_v37.fasta -T PhaseByTransmission --DeNovoPrior 0.00001 -V trio1_1553_1554_1555_small.recode.vcf -ped trio1_1553_1554_1555.ped -o trio1_1553_1554_1555.vcf --MendelianViolationsFile trio1_1553_1554_1555_noMendel.tab

I get the following output VCF line: 7 151092903 . G A 338.83 PASS . GT:AD:DP:GQ:PL:TP 1|0:12,0:12:0:0,36,414:13 0|0:20,0:20:60:0,60,669:13 1|0:6,15:20:99:389,0,108:13

So the father is eventually called a het.This happens even when I set the prior to a low value of 10^-5. That does not seem like the right behavior to me, a more appropriate call would be to call both parents ref homs. The genotype likelihood certainly suggest that for a 10^-5 prior of de novo event, this would make sense.

EDIT: OK, I wish I could remove this post. I don't think I can but I can edit the answer at least. I was just misreading the genotype likelihood. The evidence in favour of a homozygous call in the father is in fact weaker than I thought. A prior of de novo calls of 5x10^-4 fixes things, and with that threshold I am getting a proper de novo call at this location. I apologize for the pointless post!

Created 2013-07-23 16:20:21 | Updated | Tags: phasebytransmission readbackedphasing trio genotype

Comments (4)

I am doing a WGS project on a family with seven siblings. We have data on the mother but the father passed many years ago. I tried splitting variant recalibrated vcf file and ped file into "trios" with just the mother and a sibling (seven times) then running PhaseByTransmission on the combined vcf. The job was successfully completed but nothing appears phased (all "/," and no "|") in the output vcf. I also tried the variant recalibrated vcf file separately with ReadBackedPhasing. The job was successfully completed as well but again nothing appears phased (all "/" and no "|" or assigned "PQ" scores). The ProduceBeagleInput walker (to use Beagle for genotype refinement) appears to only support unrelated individuals and my set involves related individuals. Do you have any other suggestions for phasing incomplete "trios?" Thanks in advance!

Created 2013-04-22 14:51:02 | Updated | Tags: unifiedgenotyper trio exome variant-calling

Comments (2)


I have exome data for a few sets of parent-offspring trios, in which offspring have phenotypically related but probably genetically different diseases. Their parents are unaffected so I am particularly interested in identifying de novo mutations. My plan was to analyse each individual separately up to the variant calling phase and then to input three (mother, father, child) analysis-ready BAMs into the UnifiedGenotyper along with a ped file. My questions are:

1) Can you tell me whether the UnifiedGenotyper uses pedigree information in the ped file to call genotypes more accurately? In other words, is this better than calling variants jointly without supplying the ped file? If not, does GATK recommend any external tools for doing this step?

2) It is better to call variants jointly using all the trios (even though they are not related and probably don't share the same disease-causing mutations)?

Best wishes,


Created 2012-07-23 17:25:42 | Updated 2012-12-17 14:51:49 | Tags: phasebytransmission trio pedigree

Comments (4)

Is it possible to use PhaseByTransmission with families that are larger than a single trio? I have a family with four siblings. If I include all of the siblings in the PED I get:

PhaseByTransmission - Caution: Family BMD has 6 members; At the moment Phase By Transmission only supports trios and parent/child pairs. Family skipped.
ERROR MESSAGE: Bad input: No PED file passed or no trios found in PED file. Aborted.

And if I just include the one key trio with the proband, I get the following:

ERROR MESSAGE: Sample BMD006_R found in data sources but not in pedigree files with STRICT pedigree validation

There does not seem to be an accessible argument for relaxing the pedigree validation. Is there a way to use PhaseByTransmission with my larger family?