Tagged with #reordersam
1 documentation article | 0 announcements | 2 forum discussions

Created 2012-08-11 06:46:51 | Updated 2016-02-17 06:53:06 | Tags: vcf bam script reordersam sorting contig-order

Comments (22)

This is not as common as the "wrong reference build" problem, but it still pops up every now and then: a collaborator gives you a BAM or VCF file that's derived from the correct reference, but for whatever reason the contigs are not sorted in the same order. The GATK can be particular about the ordering BAM and VCF files so it will fail with an error in this case.

So what do you do?

For BAM files

You run Picard's ReorderSam tool on your BAM file, using the reference genome dictionary as a template, like this:

java -jar picard.jar ReorderSam \
    I=original.bam \
    O=reordered.bam \
    R=reference.fasta \

Where reference.fasta is your genome reference, which must be accompanied by a valid *.dict dictionary file. The CREATE_INDEX argument is optional but useful if you plan to use the resulting file directly with GATK (otherwise you'll need to run another tool to create an index).

Be aware that this tool will drop reads that don't have equivalent contigs in the new reference (potentially bad or not, depending on what you want). If contigs have the same name in the BAM and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a liftover tool!

For VCF files

You run Picard's SortVcf tool on your VCF file, using the reference genome dictionary as a template, like this:

java -jar picard.jar SortVcf \
    I=original.vcf \
    O=sorted.vcf \

Where reference.dict is the sequence dictionary of your genome reference.

Note that you may need to delete the index file that gets created automatically for your new VCF by the Picard tool. GATK will automatically regenerate an index file for your VCF.

Version-specific alert for GATK 3.5

In version 3.5, we added some beefed-up VCF sequence dictionary validation. Unfortunately, as a side effect of the additional checks, some users have experienced an error that starts with "ERROR MESSAGE: Lexicographically sorted human genome sequence detected in variant." that is due to unintentional activation of a check that is not necessary. This will be fixed in the next release; in the meantime -U ALLOW_SEQ_DICT_INCOMPATIBILITY can be used (with caution) to override the check.

No articles to display.

Created 2015-12-09 22:34:45 | Updated 2015-12-09 22:36:50 | Tags: picard reordersam

Comments (7)

Origin of the problem: GATK detected different order of the bam file and the reference file as follows:

ERROR MESSAGE: Input files reads and reference have incompatible contigs: The contig order in reads and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..
ERROR reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, X, Y, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, MT, NT_113887, ...]
ERROR reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, NT_113887, ...]

Then I referred to the link, https://www.broadinstitute.org/gatk/guide/article?id=1328

And decided to use Picard ReorderSam tool, which led me to the issue reported here,

The Problem: Picard ReorderSam terminates with error. Command: java -Xmx110g -Djava.io.tmpdir=$workDir/merged-bams/tmp -jar ./picard/1.115/ReorderSam.jar ALLOW_INCOMPLETE_DICT_CONCORDANCE=true TMP_DIR=$workDir/merged-bams/tmp I=$workDir/merged-bams/$sample.sorted.cleaned.bam R=$refGenome O=$workDir/merged-bams/$sample.sorted.reordered.bam Error: INFO 2015-12-09 12:24:39 ReorderSam Writing reads... INFO 2015-12-09 12:24:39 ReorderSam Processing All reads [Wed Dec 09 13:20:08 CST 2015] picard.sam.ReorderSam done. Elapsed time: 55.49 minutes. Runtime.totalMemory()=15967387648 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Read name HWUSI-EAS1612_61FV6:6:91:1510:1207#0, Read CIGAR M operator maps off end of reference at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:452) at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:247) at htsjdk.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:460) at htsjdk.samtools.SAMRecord.computeIndexingBin(SAMRecord.java:1235) at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:1609) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:642) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488) at picard.sam.ReorderSam.writeReads(ReorderSam.java:165) at picard.sam.ReorderSam.doWork(ReorderSam.java:127) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183) at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124) at picard.sam.ReorderSam.main(ReorderSam.java:85) [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.

Attempts to fix:

  1. I performed Picard CleanSam to solve this error: java -Xmx56g -jar ./picard/1.115/CleanSam.jar I=$workDir/merged-bams/$sample.sorted.bam O=$workDir/merged-bams/$sample.sorted.cleaned.bam And the output is attached.
  2. Then reordering this cleaned sam also throws the same error as mentioned above.
  3. Next I performed Picard ValidateSamfile and the log contains the readnames with error"Read CIGAR M operator maps off end of reference"

Can you please help me get around this issue? All I really want is proceed with GATK having same order of bam and reference contigs. I have been referring vigorously to several GATK discussions, but none addresses this issues directly or has helped find a solution.

Created 2014-04-17 19:41:42 | Updated 2014-04-17 19:42:38 | Tags: indelrealigner indel-vcf-gatk reordersam

Comments (6)

Hi. I'm unable to use the IndelRealigner java jar. My previous steps were;

1) Convert Bowtie2 paired-end Illumina Reads .sam to .bam

2) Use bedtools to extract pairs that fall within the Hg19 exome.

3) Convert the new .bam to .sam

4) Sort the new .sam via SortSam.jar

5) Mark duplicates via MarkDuplicates.jar

6) Use AddOrReplaceReadGroups.jar

7 ) Use ReorderSam.jar

8) Use RealignerTargetCreator

Untill this far everything went well. Now I'm trying the following command; java -jar GenomeAnalysisTK.jar -T IndelRealigner -R [.fasta] -l [ReorderedSam.bam] -targetIntervals [aligner.intervals] -o output.bam (Also when applying -known and an .vcf file Im producing the same error):

ERROR MESSAGE: Unable to match: GATK_7-PicardReorderSam.bam to a logging level, make sure it's a valid level (DEBUG, INFO, WARN, ERROR, FATAL, OFF)

I hope you can help me, because I can't find anything related on google.