Tagged with #liftovervariants
0 documentation articles | 0 announcements | 3 forum discussions

No articles to display.

No articles to display.

Created 2015-08-12 23:44:49 | Updated 2015-08-12 23:53:26 | Tags: liftovervariants vcf hapmap picard liftovervcf

Comments (1)

I am having a problem with picard's LiftoverVcf.

I am trying to Liftover hapmap files (downloaded plink files from hapmap and converted to vcf using plink) from ncbi36 to hg38. I was able to do this with GATK LiftoverVariants. My problem came when I had to merge the hapmap.hg38 with some genotype files (that I liftover from hg19 to hg38 using GATK LiftoverVariants). I am merging them so that I can run population stratification using plink. I used vcf-merge but it complained that a SNP has different reference allele in both files: rs3094315, should be reference allele G (which was correct in the genotype.hg38 files but in the hapmap.hg38 files it was wrong). I also first tried to lift hapmap.ncbi36 to hg19 then to hg38 but the offending allele was still there. So I decided to try and lift the hapmap.ncbi36 using LiftoverVCF from picard.

  1. I downloaded the newest picard build (20 hours old) picard-tools-1.138.
  2. Used the command: java -jar -Xmx6000m ../../../tools/picard-tools-1.138/picard.jar LiftoverVcf I=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.vcf O=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.vcf C=../../../tools/liftover/chain_files/hg18ToHg38.over.chain REJECT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.reject.vcf R=../../../data/assemblies/hg38/hg38.fa VERBOSITY=ERROR

Here is the run: [Thu Aug 13 00:43:40 CEST 2015] picard.vcf.LiftoverVcf INPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.vcf OUTPUT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.vcf CHAIN=......\tools\liftover\chain_files\hg18ToHg38.over.chain REJECT=all_samples_hapmap3_r3_b36_fwd.qc.poly.tar.picard.hg38.reject.vcf REFERENCE_SEQUENCE=......\data\assemblies\hg19\assemble\hg38.fa VERBOSITY=ERROR QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json

Here is the error: Exception in thread "main" java.lang.IllegalStateException: Allele in genotype A not in the variant context [T, C] at htsjdk.variant.variantcontext.VariantContext.validateGenotypes(VariantContext.java:1357) at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1295) at htsjdk.variant.variantcontext.VariantContext.(VariantContext.java:410) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:496) at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:490) at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:200) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

  1. I have no idea which SNP is the problem.
  2. I do not know what T* means (does not seem to exist in the file).
  3. I am new to picard so I thought VERBOSE=ERROR will give me something more but nothing more appeared.
  4. Given that lifting hapmap.ncbi36 to hg19 then to hg38 produced the same erroneous reference allele I suppose lifting will not fix this and I will have to work with dnsnp to correct my file. Do you know how I can change reference allele in a vcf? Is there a tool for this? Is there a liftover tool for dbsnp?
  5. As a side note I want to make picard work because I read that you will be deprecating the GATK liftover and will support the picard liftover (at some point in the future) so help with this tool will be appreciated.

Created 2013-05-15 20:22:17 | Updated 2013-05-15 20:26:59 | Tags: liftovervariants

Comments (1)

When running the LiftoverVariants command, the resulting VCF header has the old contig names (ex. b37 to hg19, chromosome one's name is still "1" rather than "chr1"). This only affects the header, while the variants themselves are successfully changed. I am running "2013.1-2.4.9-3-g512dc3e". The command I used is shown below: <gatk> -R <human_g1k_v37>.fasta -T LiftoverVariants -V <in>.vcf -dict <hg19>.dict -o <out>.vcf -chain <b37tohg19>.chain

Thanks for your help!

Created 2012-10-18 23:10:15 | Updated 2012-10-19 17:56:13 | Tags: bundle liftovervariants hapmap

Comments (10)


We are sequencing some of the HapMap samples (NA19240 for instance) and we compared the calls we get for the HapMap samples with the calls that are reported for these samples in the file "hapmap_3.3.b37.vcf" that was part of the GATK bundle 1.5.

We are surprised to find a lot of discordant calls, but when we verified the calls, we could see no evidence in the HapMap sequence data for many of them.

We see the discordance at many sites and the vast majority of those show multiple alleles for the positions (like this one:

13 32914977 rs11571660 A C,T . PASS AC=1,2785;AF=0.00036,0.99964;AN=2786;set=Intersection GT 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 2/2 ... )

We call many of these sites as Homozygous REF...

When I inspect the header I see this in the header:


But the file seem to suggest it was done against b37 which implies hg19... Was this file created with liftover?

Also...Could it be possible that HG19 was updated to reflect the most common allele in the population (as you can see of the MAF of 0.00036 for this example) when it went from HG18 to HG19, but that the VCF file was not updated for the HapMap 3.3 samples to reflect this?

So the bottom line is, that the HapMap3.3 file could not be used to rely on the actual calls for the HapMap samples, since about 10% of the sites show variant calls, while the HG19 reference shows no variance...

I know that HapMap is used for different purpose in GATK (Variant Score Calibration), but we may want to warn the public that you cannot use the file as a source of variation for the HapMap samples it reports on...