Input files have incompatible contigs
Posted in Common Problems on 2012-07-23 18:15:52 | Last updated on 2015-11-21 03:41:00

Comments (56)

These errors occur when the contig names, sizes or order doesn't match between input files. The most common case is when you try to use a VCF that does not match the reference.

VCF not matching the reference

This is a classic problem: you get some VCF files from collaborators, you try to use them with your own data, and GATK fails with a big fat error saying that the references don't match.


So what do you do? If you can, you find a version of the VCF file that is derived from the right reference. If you're working with human data and the VCF in question is just a common resource like dbsnp, you're in luck -- we provide versions of dbsnp and similar resources derived from the major human reference builds in our resource bundle (see FAQs for access details).

username: gsapubftp-anonymous

If that's not an option, then you'll have to "liftover" -- specifically, liftover the mismatching VCF to the reference you need to work with. The best tool for liftover is Picard's LiftoverVCF.

Liftover procedure with older versions of GATK

This procedure involves three steps:

  1. Run GATK LiftoverVariants on your VCF file
  2. Run a script to sort the lifted-over file
  3. Filter out records whose REF field does not match the new reference

We provide a script that performs those three steps for you, called, which is available in our public source repository under the 'perl' directory. Instructions for pulling down our source are available here.

The example below shows how you would run the script:

./ \
    -vcf calls.b36.vcf \                    # input vcf
    -chain b36ToHg19.broad.over.chain \     # chain file
    -out calls.hg19.vcf \                   # output vcf
    -gatk gatk_source \                     # path to source code
    -newRef Homo_sapiens_assembly19 \       # path to new reference base name (without extension)
    -oldRef human_b36_both \                # path to old reference prefix (without extension)
    -tmp /broad/shptmp [defaults to /tmp]   # temp file location (defaults to /tmp)

We provide several chain files to liftover between the major human reference builds, also in our resource bundle (mentioned above) in the Liftover_Chain_Files directory. If you are working with non-human organisms, we can't help you -- but others may have chain files, so ask around in your field.

Note that if you're at the Broad, you can access chain files to liftover from b36/hg18 to hg19 on the humgen server.


Return to top Comment on this article in the forum