I am having a problem with picard's LiftoverVcf.
I am trying to Liftover hapmap files (downloaded plink files from hapmap and converted to vcf using plink) from ncbi36 to hg38. I was able to do this with GATK LiftoverVariants. My problem came when I had to merge the hapmap.hg38 with some genotype files (that I liftover from hg19 to hg38 using GATK LiftoverVariants). I am merging them so that I can run population stratification using plink. I used vcf-merge but it complained that a SNP has different reference allele in both files: rs3094315, should be reference allele G (which was correct in the genotype.hg38 files but in the hapmap.hg38 files it was wrong). I also first tried to lift hapmap.ncbi36 to hg19 then to hg38 but the offending allele was still there. So I decided to try and lift the hapmap.ncbi36 using LiftoverVCF from picard.
Here is the run: [Thu Aug 13 00:43:40 CEST 2015] picard.vcf.LiftoverVcf INPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.vcf OUTPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.vcf CHAIN=......\tools\liftover\chain_files\hg18ToHg38.over.chain REJECT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.reject.vcf REFERENCE_SEQUENCE=......\data\assemblies\hg19\assemble\hg38.fa VERBOSITY=ERROR QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
Here is the error:
Exception in thread "main" java.lang.IllegalStateException: Allele in genotype A* not in the variant context [T*, C]
When I tried to call SNP/Indel through the V3.3 GATK, I found a problem, how can I get the following datasets?
True sites training resource: HapMap True sites training resource: Omni Non-true sites training resource: 1000G Known sites resource, not used in training: dbSNP Known and true sites training resource: Mills
Does GATK provide the corresponding vcf files such as "hapmap.vcf","omni.vcf","1000G.vcf""dbsnp.vcf""mills.vcf" ?
I’d like to convert a hapmap file to vcf. The hapmap file is from http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/latest/forward/non-redundant/genotypes_chr1_ASW_r27_nr.b36_fwd.txt.gz
A few questions about the following command at http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_VariantsToVCF.html#--dbsnp
java -Xmx2g -jar GenomeAnalysisTK.jar \ -R ref.fasta \ -T VariantsToVCF \ -o output.vcf \ --variant:RawHapMap input.hapmap \ --dbsnp dbsnp.vcf
We are sequencing some of the HapMap samples (NA19240 for instance) and we compared the calls we get for the HapMap samples with the calls that are reported for these samples in the file "hapmap_3.3.b37.vcf" that was part of the GATK bundle 1.5.
We are surprised to find a lot of discordant calls, but when we verified the calls, we could see no evidence in the HapMap sequence data for many of them.
We see the discordance at many sites and the vast majority of those show multiple alleles for the positions (like this one:
13 32914977 rs11571660 A C,T . PASS AC=1,2785;AF=0.00036,0.99964;AN=2786;set=Intersection GT 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 ... )
We call many of these sites as Homozygous REF...
When I inspect the header I see this in the header:
But the file seem to suggest it was done against b37 which implies hg19... Was this file created with liftover?
Also...Could it be possible that HG19 was updated to reflect the most common allele in the population (as you can see of the MAF of 0.00036 for this example) when it went from HG18 to HG19, but that the VCF file was not updated for the HapMap 3.3 samples to reflect this?
So the bottom line is, that the HapMap3.3 file could not be used to rely on the actual calls for the HapMap samples, since about 10% of the sites show variant calls, while the HG19 reference shows no variance...
I know that HapMap is used for different purpose in GATK (Variant Score Calibration), but we may want to warn the public that you cannot use the file as a source of variation for the HapMap samples it reports on...