I'm trying to phase GATK genotype and to impute some SNP calls. Before I could do that, I must convert GATK results to an acceptable BEAGLE input format. What's the difference between VariantsToBeagleUnphased and ProduceBeagleInput? I know the latter outputs a file with genotype likelihoods. Incidentally, using that file didn't work in BEAGLE and produced the following log and error files. Can anyone give any pointers? Thanks in advance!

[stechen@node24 ~]$ more beagle_run_410.o720239 Beagle version 3.3.2 (31 Oct 2011) Enter "java -jar beagle.jar" for summary of command line arguments. Start time: 11:59 AM EDT on 07 Aug 2013

Command line: java -Xmx7281m -jar beagle.jar like=beagle_input_410_impute phased=~/stechen/phase_ref/ALL.chr1.phase1_release_v3.20101123.filt.bgl markers=~stechen/phase_ref/ALL.chr1.phase1_release_v3.20101123.filt.markers missing=? out=beagle_output_410_chr1

[stechen@node24 ~]$ more beagle_run_410.e720239 bash: module: line 1: syntax error: unexpected end of file bash: error importing function definition for `module' Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=~stechen/tmp -Xms256m -Xmx8G Exception in thread "main" java.lang.NullPointerException at phaser.y.a(Unknown Source) at phaser.y.(Unknown Source) at phaser.H.a(Unknown Source) at phaser.v.(Unknown Source) at phaser.PhaseMain.(Unknown Source) at phaser.PhaseMain.main(Unknown Source)

Dear GATK team and community members,

I used ProduceBeagleInput to create a genotype likelihoods file, and ran beagle.jar according to the example in http://gatkforums.broadinstitute.org/discussion/43/interface-with-beagle-software. Beagle gave a warning that it is better to use a reference panel for imputing genotypes and phasing. So I downloaded the recommended reference panel (http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes.phase1_release_v3/), but Beagle requires that the alleles be in the same order on both reference and sample files. The tool to do this is check_strands.py (http://faculty.washington.edu/sguy/beagle/strand_switching/README), but it requires both sample and reference files be in .bgl format. This is a little disappointing since not being able to use the reference panel means Beagle's calculations won't be as accurate, although I'm not sure by how much.

I understand that this might be out of the scope of responsibility for the GATK team, but I will greatly appreciate if someone can provide suggestions to allow GATK's input to Beagle be phased using a reference panel. Or hopefully, the GATK team will write a tool to produce .bgl files?

Regards, Jamie

I used Beagle to phase my data but for some indels, I have some probleme :

example :

Input vcf :

2       68599872        .       ATG     A       14.40   PASS    AC=1;AC1=1;AF=0.028

Input for beagle created by ProduceBeagleInput:

2:68599872 TG - 1.0000 0.0000 0.0000 ......

Output vcf created by BeagleOutputToVCF:

2       68599872        .       ATG     .       14.40   BGL_RM_WAS_-    AC1=1;AF1=0.02965.....

error message by CombineVariants:

MESSAGE: Badly formed variant context at location 68599872 in contig 2. Reference length must be at most one base shorter than location size

Can you help me?