Hi, I am trying to identify the shared region of genome among several pair of brothers in ant who are haploid. I have run the GATK variant discovery pipeline and currently I have VCF file which have only the Homozygous snps as expected. Now can anyone help me to know whether I can make use of beagle to get the shared/common haplotype information from the pair of brothers.also correct me if I am doing something wrong as I new to this field.
PS as of the VCF is unphased.
I have been using HaplotypeCaller 3.4 on five hundred cattle genomes. I am wondering how to pass the physical phasing information, now generated by Haplotype Caller in N+1 mode, through GenotypeGVCF applying pedigree and then out to Beagle 4.0 as a vcf.gz for imputation. My goal is to make a phased reference that is as accurate as possible to be used as an imputation resource. hence I would like to exploit physical phasing information.
Do you have an example work flow? It seems that the recommendations for read-backed phasing have changed since haplotype caller 3.3 came up with the N+1 workflow.
I'm trying to phase GATK genotype and to impute some SNP calls. Before I could do that, I must convert GATK results to an acceptable BEAGLE input format. What's the difference between VariantsToBeagleUnphased and ProduceBeagleInput? I know the latter outputs a file with genotype likelihoods. Incidentally, using that file didn't work in BEAGLE and produced the following log and error files. Can anyone give any pointers? Thanks in advance!
[stechen@node24 ~]$ more beagle_run_410.o720239 Beagle version 3.3.2 (31 Oct 2011) Enter "java -jar beagle.jar" for summary of command line arguments. Start time: 11:59 AM EDT on 07 Aug 2013
Command line: java -Xmx7281m -jar beagle.jar like=beagle_input_410_impute phased=~/stechen/phase_ref/ALL.chr1.phase1_release_v3.20101123.filt.bgl markers=~stechen/phase_ref/ALL.chr1.phase1_release_v3.20101123.filt.markers missing=? out=beagle_output_410_chr1
[stechen@node24 ~]$ more beagle_run_410.e720239
bash: module: line 1: syntax error: unexpected end of file
bash: error importing function definition for `module'
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=~stechen/tmp -Xms256m -Xmx8G
Exception in thread "main" java.lang.NullPointerException
at phaser.y.a(Unknown Source)
Dear GATK team and community members,
I used ProduceBeagleInput to create a genotype likelihoods file, and ran beagle.jar according to the example in http://gatkforums.broadinstitute.org/discussion/43/interface-with-beagle-software. Beagle gave a warning that it is better to use a reference panel for imputing genotypes and phasing. So I downloaded the recommended reference panel (http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes.phase1_release_v3/), but Beagle requires that the alleles be in the same order on both reference and sample files. The tool to do this is check_strands.py (http://faculty.washington.edu/sguy/beagle/strand_switching/README), but it requires both sample and reference files be in .bgl format. This is a little disappointing since not being able to use the reference panel means Beagle's calculations won't be as accurate, although I'm not sure by how much.
I understand that this might be out of the scope of responsibility for the GATK team, but I will greatly appreciate if someone can provide suggestions to allow GATK's input to Beagle be phased using a reference panel. Or hopefully, the GATK team will write a tool to produce .bgl files?
I used Beagle to phase my data but for some indels, I have some probleme :
Input vcf :
2 68599872 . ATG A 14.40 PASS AC=1;AC1=1;AF=0.028
Input for beagle created by ProduceBeagleInput:
2:68599872 TG - 1.0000 0.0000 0.0000 ......
Output vcf created by BeagleOutputToVCF:
2 68599872 . ATG . 14.40 BGL_RM_WAS_- AC1=1;AF1=0.02965.....
error message by CombineVariants:
MESSAGE: Badly formed variant context at location 68599872 in contig 2. Reference length must be at most one base shorter than location size
Can you help me?