Tagged with #likelihoods
0 documentation articles | 0 announcements | 4 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.
Comments (7)

I'm analyzing seven trio exomes right now with the latest GATK (version 2.7-4-g6f46d11), and was surprised to find a large number of mendelian violations reported by PhaseByTransmission, even after eliminating low/no coverage events. Tracking down the problem, it seems that CombineVariants occasionally propagates the PL field to the new vcf file incorrectly, sometimes in a way which causes GT not to correspond to the lowest PL.

Here's an example, showing just the GT, AD, and PL columns for a few positions in one trio. For each position, the first line contains the genotypes from the original vcf file, and the second shows the genotypes from the merged file.

#CHROM  POS       ID           REF  ALT  100403001-1           100403001-1A           100403001-1B        
                                                                                                          
1       5933530   rs905469     A    G    0/0:37,0:0,99,1192    0/0:35,0:0,90,1101     0/0:44,0:0,117,1412 
1       5933530   rs905469     A    G    0/0:37,0:189,15,1192  0/0:35,0:0,90,1101     0/0:44,0:0,117,1412 
                                                                                                          
1       10412636  rs4846215    A    T    0/0:119,0:0,358,4297  0/0:113,0:0,337,4060   0/0:102,0:0,304,3622
1       10412636  rs4846215    A    T    0/0:119,0:110,9,0     0/0:113,0:0,337,4060   0/0:102,0:0,304,3622
                                                                                                          
1       11729035  rs79974326   G    C    0/0:50,0:0,141,1709   0/0:53,0:0,150,1788    0/0:71,0:0,187,2246 
1       11729035  rs79974326   G    C    0/0:50,0:1930,0,3851  0/0:53,0:0,150,1788    0/0:71,0:0,187,2246 
                                                                                                          
1       16735764  rs182873855  G    A    0/0:54,0:0,138,1691   0/0:57,0:0,153,1841    0/0:47,0:0,120,1441 
1       16735764  rs182873855  G    A    0/0:54,0:174,0,1691   0/0:57,0:0,153,1841    0/0:47,0:0,120,1441 
                                                                                                          
1       17316577  rs77880760   G    T    0/0:42,0:0,123,1470   0/0:38,0:0,111,1317    0/0:53,0:0,153,1817 
1       17316577  rs77880760   G    T    0/0:42,0:233,17,1470  0/0:38,0:0,111,1317    0/0:225,25:0,153,181
                                                                                                          
1       28116000  rs2294229    A    G    0/0:37,0:0,105,1291   0/0:37,0:0,111,1379    0/0:30,0:0,87,1066  
1       28116000  rs2294229    A    G    0/0:37,0:0,105,1291   0/0:37,0:0,111,1379    0/0:30,0:1844,159,0 
                                                                                                          
1       31740706  rs3753373    A    G    0/0:123,0:0,349,4173  0/0:110,0:0,319,3793   0/0:111,0:0,328,3885
1       31740706  rs3753373    A    G    0/0:123,0:117,6,0     0/0:110,0:0,319,3793   0/0:111,0:0,328,3885

Most genotypes are propagated correctly, and in fact, which a propagated incorrectly changes from run to run.

In my case, I'm merging files from disjoint regions, so I can work around the problem, but it would be nice if this were fixed.

Thanks, Kevin

Comments (4)

Greetings,

I am trying to incorporate genotype likelihoods into a downstream analysis. I have two questions:

1) Why is the most likely genotype scaled to a Phred score of zero?

2) Is there a way to undo the scaling? I have seen downstream tools undo the scaling, but I don't know how they do it. Is there an equation that will return an estimated genotype likelihood from the scaled genotype likelihoods?

Thank you for your time.

Zev Kronenberg

Comments (1)

Dear GATK team and community members,

I used ProduceBeagleInput to create a genotype likelihoods file, and ran beagle.jar according to the example in http://gatkforums.broadinstitute.org/discussion/43/interface-with-beagle-software. Beagle gave a warning that it is better to use a reference panel for imputing genotypes and phasing. So I downloaded the recommended reference panel (http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes.phase1_release_v3/), but Beagle requires that the alleles be in the same order on both reference and sample files. The tool to do this is check_strands.py (http://faculty.washington.edu/sguy/beagle/strand_switching/README), but it requires both sample and reference files be in .bgl format. This is a little disappointing since not being able to use the reference panel means Beagle's calculations won't be as accurate, although I'm not sure by how much.

I understand that this might be out of the scope of responsibility for the GATK team, but I will greatly appreciate if someone can provide suggestions to allow GATK's input to Beagle be phased using a reference panel. Or hopefully, the GATK team will write a tool to produce .bgl files?

Regards, Jamie

Comments (5)

The printed values missed the PL value, for examples, the format is:

GT:AD:DP:GQ:PL
['0/2', '1,0,10', '11', '8.12']
['0/2', '211,39,0', '250', '99']
['0/1', '10,1', '11', '14.38']
['0/1', '4,2', '4', '24.38']    
['0/0', '27,0', '27', '78.26']
['1/1', '164,2', '183', '99']
['0/1', '242,1', '249', '99']
['0/1', '225,0', '233', '99']
['0/0', '84,5', '82', '81.18']

For every case, the PL value is missing. It happens most often when there are more than one alternative alleles.

Thank you,

Jim