This is not as common as the "wrong reference build" problem, but it still pops up every now and then: a collaborator gives you a BAM or VCF file that's derived from the correct reference, but for whatever reason the contigs are not sorted in the same order. The GATK can be particular about the ordering BAM and VCF files so it will fail with an error in this case.
So what do you do?
You run Picard's ReorderSam tool on your BAM file, using the reference genome dictionary as a template, like this:
java -jar picard.jar ReorderSam \ I=original.bam \ O=reordered.bam \ R=reference.fasta \ CREATE_INDEX=TRUE
reference.fasta is your genome reference, which must be accompanied by a valid
*.dict dictionary file. The
CREATE_INDEX argument is optional but useful if you plan to use the resulting file directly with GATK (otherwise you'll need to run another tool to create an index).
Be aware that this tool will drop reads that don't have equivalent contigs in the new reference (potentially bad or not, depending on what you want). If contigs have the same name in the BAM and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a liftover tool!
You run Picard's SortVcf tool on your VCF file, using the reference genome dictionary as a template, like this:
java -jar picard.jar SortVcf \ I=original.vcf \ O=sorted.vcf \ SEQUENCE_DICTIONARY=reference.dict
reference.dict is the sequence dictionary of your genome reference.
Note that you may need to delete the index file that gets created automatically for your new VCF by the Picard tool. GATK will automatically regenerate an index file for your VCF.
In version 3.5, we added some beefed-up VCF sequence dictionary validation. Unfortunately, as a side effect of the additional checks, some users have experienced an error that starts with "ERROR MESSAGE: Lexicographically sorted human genome sequence detected in variant." that is due to unintentional activation of a check that is not necessary. This will be fixed in the next release; in the meantime -U ALLOW_SEQ_DICT_INCOMPATIBILITY can be used (with caution) to override the check.
Respected Sir / Ma'am,
Our team is working on NGS, we are using oncotator frequently. Thank you for great tool.
I am working on automating our NGS process, hence I nedd to submit job using command line and hence download result as well. Is there any script or command which can automate submiting job to online server and get result as well ?
CRAVAT provide web services (http://www.cravat.us/help.jsp?chapter=report_help&article=top). We have write curl script to submit job online and to download result.zip. I was able to do that.
So is this type of automation available for Oncotator ? Waiting for positive reply.