Hi GATK team, I was calling variants individually to generate gVCF files for each sample. Then I thought maybe I should also try using multiple samples as input from the ‘RealignerTargetCreator ‘ step to generate a joint bam file that contained alignment from different samples (I used different SM tag for each sample). I paste my command below. So in the ‘HaplotypeCaller’ step, I got the message ‘emitRefConfidence has a bad value’ because my bam file is mixed with different SM tag. I suppose I shouldn't change SM tags into the same, or I will not be able to identify what variants are from which individual. I think that I misunderstood the meaning of cohort calling from the beginning. So I want to clarify two points to see if I am understanding correctly now. 1.The’ HaplotypeCaller’ will only use the SM tag information to identify different individuals. Other tags like ID, PL, LB will not be considered. If all samples from different individuals have the same SM tag,HC will treat it as from one individual. Is it correct? 2. My target is to find variants between individuals, rather than in all individuals. Then I should call variant one sample per time and run ‘GenotypeGVCFs’ to join all samples together before hard filtering. Do I still misunderstand something? Thanks. The following is the command line I used. java -Xmx32g -jar $GATK_JARS/GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R ucsc.hg19.fasta \ -I aligned_TKSAHB.dedup.sorted.bam \ -I aligned_TKSAHV.dedup.sorted.bam \ -I aligned_TKSASA.dedup.sorted.bam \ -known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \ -known 1000G_phase1.indels.hg19.sites.vcf \ -o target_interval_TKSA.list \ && java -Xmx32g -jar $GATK_JARS/GenomeAnalysisTK.jar \ -T IndelRealigner \ -R ucsc.hg19.fasta \ -I aligned_TKSAHB.dedup.sorted.bam \ -I aligned_TKSAHV.dedup.sorted.bam \ -I aligned_TKSASA.dedup.sorted.bam \ -targetIntervals target_interval_TKSA.list \ -known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \ -known 1000G_phase1.indels.hg19.sites.vcf \ -o realigned_TKSA.dedup.sorted.bam
i have been using HaplotypeCaller in gVCF mode on a cohort of 830 samples spread over 2450 bams. the number of bams per sample varies from 1-4. for samples with <=3 bams, the routine works perfectly. but for samples with 4 bams, the jobs always crash and I receive the error:
ERROR MESSAGE: Invalid command line: Argument emitRefConfidence has a bad value: Can only be used in single sample mode currently
is this a bug? are there any options i can use to avoid this error. i suppose it is possible that there is an issue with my bams, but it seems odd that the error occurs systematically with 4 bam samples and never for samples with 3 or fewer bams.
thanks for any help!