Dear GATK team and community members,
I used ProduceBeagleInput to create a genotype likelihoods file, and ran beagle.jar according to the example in http://gatkforums.broadinstitute.org/discussion/43/interface-with-beagle-software. Beagle gave a warning that it is better to use a reference panel for imputing genotypes and phasing. So I downloaded the recommended reference panel (http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes.phase1_release_v3/), but Beagle requires that the alleles be in the same order on both reference and sample files. The tool to do this is check_strands.py (http://faculty.washington.edu/sguy/beagle/strand_switching/README), but it requires both sample and reference files be in .bgl format. This is a little disappointing since not being able to use the reference panel means Beagle's calculations won't be as accurate, although I'm not sure by how much.
I understand that this might be out of the scope of responsibility for the GATK team, but I will greatly appreciate if someone can provide suggestions to allow GATK's input to Beagle be phased using a reference panel. Or hopefully, the GATK team will write a tool to produce .bgl files?
Regards, Jamie
The printed values missed the PL value, for examples, the format is:
GT:AD:DP:GQ:PL
['0/2', '1,0,10', '11', '8.12']
['0/2', '211,39,0', '250', '99']
['0/1', '10,1', '11', '14.38']
['0/1', '4,2', '4', '24.38']
['0/0', '27,0', '27', '78.26']
['1/1', '164,2', '183', '99']
['0/1', '242,1', '249', '99']
['0/1', '225,0', '233', '99']
['0/0', '84,5', '82', '81.18']
For every case, the PL value is missing. It happens most often when there are more than one alternative alleles.
Thank you,
Jim