This is a classic problem: you get some VCF files from collaborators, you try to use them with your own data, and GATK fails with a big fat error saying that the references don't match.
So what do you do? If you can, you find a version of the VCF file that is derived from the right reference. If you're working with human data and the VCF in question is just a common resource like dbsnp, you're in luck -- we provide versions of dbsnp and similar resources derived from the major human reference builds in our resource bundle (see FAQs for access details).
location: ftp.broadinstitute.org username: gsapubftp-anonymous
If that's not an option, then you'll have to "liftover" -- specifically, liftover the mismatching VCF to the reference you need to work with. The procedure for doing so is described below.
This procedure involves three steps:
We provide a script that performs those three steps for you, called
liftOverVCF.pl, which is available in our public source repository under the 'perl' directory. Instructions for pulling down our source are available here.
The example below shows how you would run the script:
./liftOverVCF.pl \ -vcf calls.b36.vcf \ # input vcf -chain b36ToHg19.broad.over.chain \ # chain file -out calls.hg19.vcf \ # output vcf -gatk gatk_source \ # path to source code -newRef Homo_sapiens_assembly19 \ # path to new reference base name (without extension) -oldRef human_b36_both \ # path to old reference prefix (without extension) -tmp /broad/shptmp [defaults to /tmp] # temp file location (defaults to /tmp)
We provide several chain files to liftover between the major human reference builds, also in our resource bundle (mentioned above) in the
Liftover_Chain_Files directory. If you are working with non-human organisms, we can't help you -- but others may have chain files, so ask around in your field.
Note that if you're at the Broad, you can access chain files to liftover from b36/hg18 to hg19 on the
The GATK expects specific information in the header of BAM files (as detailed in the input requirements FAQs), and will fail with an error if it does not find that information.
So what do you do? You use a Picard tool called AddOrReplaceReadGroups to add the missing information to your BAM file.
Here's an example:
# throws an error java -jar GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R reference.fasta \ -I reads_without_RG.bam \ -o output.vcf # fix the read groups java -jar picard.jar AddOrReplaceReadGroups \ I= reads_without_RG.bam \ O= reads_with_RG.bam \ SORT_ORDER=coordinate \ RGID=foo \ RGLB=bar \ RGPL=illumina \ RGSM=Sample1 \ CREATE_INDEX=True # runs without error java -jar GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R reference.fasta \ -I reads_with_RG.bam \ -o output.vcf
Note that if you don't know what information to put in the read groups, you should ask whoever performed the sequencing or provided the BAM to give you the metadata you need.
This tool is part of the Picard package.
This error occurs when for example, a collaborator gives you a BAM that's derived from what was originally the same reference as you are using, but for whatever reason the contigs are not sorted in the same order .The GATK can be particular about the ordering of a BAM file so it will fail with an error in this case.
So what do you do? You use a Picard tool called ReorderSam to, well, reorder your BAM file.
Here's an example usage where we reorder a BAM file that was sorted lexicographically so that the output will be another BAM, but this time sorted karyotypically :
java -jar picard.jar ReorderSam \ I= lexicographic.bam \ O= kayrotypic.bam \ REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta
This tool requires you have a correctly sorted version of the reference sequence you used to align your reads. Be aware that this tool will drop reads that don't have equivalent contigs in the new reference (potentially bad, but maybe not). If contigs have the same name in the bam and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a liftover tool!
This tool is part of the Picard package.