Errors about VCFs not matching the reference
Posted in Common Problems on 2012-07-23 18:15:52 | Last updated on 2015-05-15 08:07:03


Comments (52)

This is a classic problem: you get some VCF files from collaborators, you try to use them with your own data, and GATK fails with a big fat error saying that the references don't match.

Solution

So what do you do? If you can, you find a version of the VCF file that is derived from the right reference. If you're working with human data and the VCF in question is just a common resource like dbsnp, you're in luck -- we provide versions of dbsnp and similar resources derived from the major human reference builds in our resource bundle (see FAQs for access details).

location: ftp.broadinstitute.org
username: gsapubftp-anonymous

If that's not an option, then you'll have to "liftover" -- specifically, liftover the mismatching VCF to the reference you need to work with. The procedure for doing so is described below.

Liftover procedure

This procedure involves three steps:

  1. Run GATK LiftoverVariants on your VCF file
  2. Run a script to sort the lifted-over file
  3. Filter out records whose REF field does not match the new reference

We provide a script that performs those three steps for you, called liftOverVCF.pl, which is available in our public source repository under the 'perl' directory. Instructions for pulling down our source are available here.

The example below shows how you would run the script:

./liftOverVCF.pl \
    -vcf calls.b36.vcf \                    # input vcf
    -chain b36ToHg19.broad.over.chain \     # chain file
    -out calls.hg19.vcf \                   # output vcf
    -gatk gatk_source \                     # path to source code
    -newRef Homo_sapiens_assembly19 \       # path to new reference base name (without extension)
    -oldRef human_b36_both \                # path to old reference prefix (without extension)
    -tmp /broad/shptmp [defaults to /tmp]   # temp file location (defaults to /tmp)

We provide several chain files to liftover between the major human reference builds, also in our resource bundle (mentioned above) in the Liftover_Chain_Files directory. If you are working with non-human organisms, we can't help you -- but others may have chain files, so ask around in your field.

Note that if you're at the Broad, you can access chain files to liftover from b36/hg18 to hg19 on the humgen server.

/humgen/gsa-hpprojects/GATK/data/Liftover_Chain_Files/

Return to top Comment on this article in the forum