Hi there,
I get an error when I try to run GATK with the following command:
java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I merged_bam_files_indexed_markduplicate.bam -o reads.intervals
However I get this error:
SAM/BAM file SAMFileReader{/merged_bam_files_indexed_markduplicate.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem
It suggest that it a header issue however my bam file has a header:
samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG
@RG ID:test1 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan
@RG ID:test2 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan
when I grep the read within the error:
HWI-ST303_0093:5:5:13416:34802#0 99 1 1090 29 23S60M17S = 1150 160 TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E MD:Z:60 PG:Z:MarkDuplicates RG:Z:test1 XG:i:0 AM:i:29 NM:i:0 SM:i:29 XM:i:0 XO:i:0 XT:A:M
Following Picard solution:
java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O= test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test RGPL=Illumina RGSM=test/ RGPU=HWI-ST303 RGCN=japan CREATE_INDEX=True
I get this error after 2 min.:
Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.`
Any recommendation on how to solve this issue ?
My plan is to do the following to resolve the issue:
picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=LENIANT
samtools index test_markduplicate.bam
I see a lot of messages like below but the command still running:
Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0, RG ID on SAMRecord not found in header: test1
while running the command
then try the GATK RealignerTargetCreator
I already tried to do the following
java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT
But I still got the same error
N.B: the same command run with no issue with GATK version (1.2)
My pipeline in short: mapping the paired end reads with
bwa aln -q 20 ref.fa read > files.sai
bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam
samtools view -bS test1.sam | samtools sort - test
samtools index test1.bam
samtools merge -rh RG.txt test test1.bam test2.bam
RG.txt
@RG ID:test1 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan
@RG ID:test2 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan
samtools index test.bam
picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=SILENT
samtools index test_markduplicate.bam
Dear team, I am new to GATK and I am having a hard time simply trying to merge vcf files. I have tried to solve the problem by referring to the guide and to previous posts, but nothing woked. Actually I found several discussions about the very same error message I receive, but it seems that no clear answere was provided. Here is this message:
I have tried three different MS Dos commands to do the task (see belbow), but the message didn't change:
1. java -jar GenomeAnalysisTK.jar -T CombineVariants -R E:\RessourcesGATK\ucsc.hg19.fasta -V:sample1 E:\TestGATK\sample1.vcf -V:sample2 E:\TestGATK\sample2.vcf -o combined.vcf
2. java -jar GenomeAnalysisTK.jar -R E:\RessourcesGATK\ucsc.hg19.fasta -T CombineVariants --variant E:\TestGATK\sample1.vcf --variant E:\TestGATK\sample2.vcf -o output.vcf -genotypeMergeOptions UNIQUIFY
3.java -jar GenomeAnalysisTK.jar -R E:\RessourcesGATK\ucsc.hg19.fasta -T CombineVariants --variant E:\TestGATK\sample1.vcf --variant E:\TestGATK\sample2.vcf -o output.vcf -genotypeMergeOptions PRIORITIZE -priority foo,bar
I have also tried to use the reference human_g1k_v37.fasta as a resource, but it was the same. I have suppressed the # before CHROM in the header line, tested vcf generated by Samtools or by GATK, but it did not change anything. Is this a problem of environment? I haven't read anything mentioning that GATK could not work with MS Dos.
Thank you very much for your help. S.
Hello,
i have spend many hours trying to run GATK depthofcoverage but it never works.
last try: -T DepthOfCoverage -R /home/remi/Analyse/CNV/ERR125905/bam/Chloroplastgenomebarley.fa -I /home/remi/Analyse/CNV/ERR125905/bam/ERfiltre.bam -L /home/remi/Analyse/CNV/ERR125905/bam/ERfiltre.bed -o /home/remi/Analyse/CNV/FishingCNV_1.5.3/out/ERfiltre.bam.coverage --minMappingQuality 15 --minBaseQuality 10 --omitDepthOutputAtEachBase --logging_level INFO --summaryCoverageThreshold 5 --summaryCoverageThreshold 7 --summaryCoverageThreshold 10 --summaryCoverageThreshold 15 --summaryCoverageThreshold 20 --summaryCoverageThreshold 30 --summaryCoverageThreshold 50
My BAM header seem to be malformed.
ERROR MESSAGE: SAM/BAM file /home/remi/Analyse/CNV/ERR125905/bam/ERfiltre.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups
here is the 1rst line of the header:
@SQ SN:Chloroplastgenomebarley LN:136462 @PG ID:bwa PN:bwa VN:0.5.9-r16 ERR125905.35 99 Chloroplastgenomebarley 69543 29 101M = 69854 412 TTTGATCCCTCTGATCCTGTTCTGGATCCAATGTGGAGACAAGGTATGTTCGTAATTCCCTTCATGACTCGTTTAGGAATAACGGATCCTTGGGGTGGTTG D-:D?BDDDDCC-?ADCBBBDDDDD:BDD= :6 C-4<9@62@@<:?=B??B=DC28=B&?:AA:4 ERR125905.35 147 Chloroplastgenomebarley 69854 29 101M = 69543 -412 GGCTTTCTGTCGCTTGTGGGCTTTTCCTATAACGGCTTTTTATGTTCCTGGGATATGGGTATCCGATCCTTATGGACTAACTGGAAAAGTACAAGCTGTAA #################################################A-B49= @@2>+:CCC:@@ 66DD@-@DDD?B::@-CA:5?:ADD?ADBB??
I Have search in the forum and doc about it. I have try to reorder my header with picard:
@HD VN:1.4 SO:unsorted @SQ SN:Chloroplastgenomebarley LN:136462 UR:file:/home/remi/Analyse/REFGEN/Chloroplastgenomebarley.fa M5:7a7b36ef01cc1a2af1c8451ca3800f93 @PG ID:bwa PN:bwa VN:0.5.9-r16 ERR125905.35 99 Chloroplastgenomebarley 69543 29 101M = 69854 412 TTTGATCCCTCTGATCCTGTTCTGGATCCAATGTGGAGACAAGGTATGTTCGTAATTCCCTTCATGACTCGTTTAGGAATAACGGATCCTTGGGGTGGTTG D-:D?BDDDDCC-?ADCBBBDDDDD:BDD= :6 C-4<9@62@@<:?=B??B=DC28=B&?:AA:4 ERR125905.35 147 Chloroplastgenomebarley 69854 29 101M = 69543 -412 GGCTTTCTGTCGCTTGTGGGCTTTTCCTATAACGGCTTTTTATGTTCCTGGGATATGGGTATCCGATCCTTATGGACTAACTGGAAAAGTACAAGCTGTAA #################################################A-B49= @@2>+:CCC:@@ 66DD@-@DDD?B::@-CA:5?:ADD?ADBB??but no more change.
Someone can help me please ?
Regards, Remi
`