I was just wondering if anyone uses GATK's SelectVariants walker to call de novo mutations (Mendelian violations) and, if so, what -mvq cut-off do they use? My data is exome sequencing with a large range of read depths - from mean target coverage of 14X to >50X.
I'm currently trying to extract de novo mutations from my multi-sample vcf files (trios). I've already read the VCF file specification documentation but wanted to check if I got this right. So I would call a de novo mutation candidate in the following cases:
1.Child has the genotype 0|1 , 1|0 or 1|1 and both parents have 0|0
2.Child has the genotype 1|0 or 1|1 and the mother 0|0
3.Child: 0|1 or 1|1 and the father 0|0
Is this correct ? And are there any other cases which indicate a de novo mutation which I missed so far ?
(EDIT: solution found and explained below, mostly an error on my end, sorry)
I have what I know is a de novo variant (validated) and GATK PhaseByTransmission refuses to see it. Here is what I am starting with in my VCF file: 7 151092903 . G A 338.83 PASS . GT:AD:DP:GQ:PL 0/0:12,0:12:36:0,36,414 0/0:20,0:20:60:0,60,669 0/1:6,15:20:99:389,0,108
So: - the father is 12 ref, 0 alt - the mother is 20 ref, 0 alt - the offspring is 6 ref, 15 alt
When I run java -Xmx2g -jar GenomeAnalysisTK-2.7-2-g6bda569/GenomeAnalysisTK.jar -R fasta/human_g1k_v37.fasta -T PhaseByTransmission --DeNovoPrior 0.00001 -V trio1_1553_1554_1555_small.recode.vcf -ped trio1_1553_1554_1555.ped -o trio1_1553_1554_1555.vcf --MendelianViolationsFile trio1_1553_1554_1555_noMendel.tab
I get the following output VCF line: 7 151092903 . G A 338.83 PASS . GT:AD:DP:GQ:PL:TP 1|0:12,0:12:0:0,36,414:13 0|0:20,0:20:60:0,60,669:13 1|0:6,15:20:99:389,0,108:13
So the father is eventually called a het.This happens even when I set the prior to a low value of 10^-5. That does not seem like the right behavior to me, a more appropriate call would be to call both parents ref homs. The genotype likelihood certainly suggest that for a 10^-5 prior of de novo event, this would make sense.
EDIT: OK, I wish I could remove this post. I don't think I can but I can edit the answer at least. I was just misreading the genotype likelihood. The evidence in favour of a homozygous call in the father is in fact weaker than I thought. A prior of de novo calls of 5x10^-4 fixes things, and with that threshold I am getting a proper de novo call at this location. I apologize for the pointless post!
Can you use SelectVariants with a combined vcf to produce a new vcf containing only variants present in a particular sample eg. you can select out de novo mutations from a combined family vcf?