I've tried my best to find the problem but without luck, so my first post...
First the background:
I have used HaplotypeCaller to make GVCFs per individual for 2 cohorts of 270 and 300 individuals each.
I then used CombineGVCFs to merged all the individuals in each cohort across 10Mb chunks across the genome.
These steps appear to have worked OK! Then I have been attempting to call genotypes from pairs GVCFs (one from each cohort) per region using GenotypeGVCFs with a command such as:
java -jar -Xmx6G /gpfs0/apps/well/gatk/3.2-2/GenomeAnalysisTK.jar \ -T GenotypeGVCFs \ -R ../hs37d5/hs37d5.fa \ --variant /well/hill/kate/exomes//haploc/cap270/cap270_1:230000000-240000000_merged_gvcf.vcf \ --variant /well/hill/kate/exomes//haploc/sepsis300/sepsis300_1:230000000-240000000_merged_gvcf.vcf \ -o /well/hill/kate/exomes//haploc/cap270-sepsis300/cap270-sepsis300_1:230000000-240000000_raw.vcf
This seems to start fine (and works perfectly to completion for some regions), but then throws a error for some of the files. For this example the error is:
So to try and solve this I have grepped "./.:29:75:25:0,63,945 " from the VCF file (expecting to find a truncated line), but could find nothing wrong with the lines containing this expression. I looked carefully at the first line containing this after the last position that was written in the output - nothing out of the ordinary that I can see.
Any help/advice gratefully received! Kate
In addition to the standard mutect output, I'm interested in vcf output, and was happy to find
a previous related question showing how to output vcf. However, I seem to be having some
trouble with what I think is misformed output. Specifically, the genotype field is "0" for
normal and "0/1" for tumor on every line
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT normal tumor 7 55230840 rs7781264 A G . REJECT DB GT:AD:BQ:DP:FA 0:0,1:.:1:1.00 0/1:0,27:29:27:1.00 7 55233109 rs150899403 G A . PASS DB;SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0:134,0:.:132:0.00:0 0/1:380,296:24:688:0.438:2 7 55233265 . A C . REJECT . GT:AD:BQ:DP:FA 0:6,0:.:6:0.00 0/1:251,24:12:275:0.087
The corresponding lines in the mutect output file are
## muTector v1.0.47986 contig position context ref_allele alt_allele tumor_name normal_name score dbsnp_site covered power tumor_power normal_power total_pairs improper_pairs map_Q0_reads t_lod_fstar tumor_f contaminant_fraction contaminant_lod t_ref_count t_alt_count t_ref_sum t_alt_sum t_ref_max_mapq t_alt_max_mapq t_ins_count t_del_count normal_best_gt init_n_lod n_ref_count n_alt_count n_ref_sum n_alt_sum judgement 7 55230840 ACTxTGC A G tumor normal 0 DBSNP UNCOVERED 0 0.612407 0 30 1 0 93.647278 1 0.02 -0.236654 0 27 0 808 0 70 0 0 GG -3.882263 0 1 0 30 REJECT 7 55233109 TGTxCCA G A tumor normal 0 DBSNP+COSMIC COVERED 1 1 1 1140 3 7 665.64967 0.43787 0.02 28.941878 380 296 10901 7263 70 70 0 0 GG 40.298595 134 0 4097 0 KEEP 7 55233265 CCCxCAG A C tumor normal 0 NOVEL UNCOVERED 0 1 0 305 5 0 8.112745 0.087273 0.02 2.430961 251 24 6289 302 70 70 0 0 AA 1.803681 6 0 154 0 REJECT
If it matters, this was with openjdk 1.6:
$ /usr/lib/jvm/java-1.6.0-openjdk-184.108.40.206.x86_64/jre/bin/java -version java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.5) (rhel-220.127.116.11.5.el6_3-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Any idea what might be causing this, and is there anything you or I can do to fix it?
I am having trouble calling variants using Haplotype Caller on simulated exome reads. I have been able to call reasonable-looking variants on the exome (simulated with dwgsim) with HaplotypeCaller before running it through the Best Practices Pre-Processing pipeline. The pre-processed data worked fine with UnifiedGenotyper but with HaplotypeCaller, though it runs without errors and seems to walk across the genome, only outputs a VCF header. I have tried calling variants with and without using -L to provide the exome regions (as recommended in this forum post: http://gatkforums.broadinstitute.org/discussion/1681/expected-file-size-haplotype-caller) but this hasn't made a difference - when we run the command with the pre-processed BAMs, we only get a VCF header. Everything has been tested with both 2.4-7 and 2.4-9.
Any help or guidance would be greatly appreciated!
Command Used for HaplotypeCaller:
java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I exome.realigned.dedup.recal.bam -o exome.raw.vcf -D dbsnp_137.hg19.vcf -stand_emit_conf 10 -rf BadCigar -L Illumin_TruSeq.bed --logging_level DEBUG
Commands Used for pre-processing (run in sequence using a Perl script):
java -Xmx16g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -nt 8 -R ucsc.hg19.fasta -I exome.bam -o exome.intervals -known dbsnp_137.hg19.vcf
java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R ucsc.hg19.fasta -I exome.bam -o exome.realigned.bam -targetIntervals intervals.bam -known dbsnp_137.hg19.vcf
java -Xmx16g -jar MarkDuplicates.jar I=exome.realigned.bam METRICS_FILE=exome.dups O=exome.realigned.dedup.bam
samtools index exome.realigned.dedup
java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -nct 8 -R ucsc.hg19.fasta -I exome.realigned.dedup.bam -o exome.recal_data.grp -knownSites dbsnp_137.hg19.vcf -cov ReadGroupCovariate -cov ContextCovariate -cov CycleCovariate -cov QualityScoreCovariate
java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -nct 8 -R ucsc.hg19.fasta -I exome.realigned.dedup.bam -BQSR exome.recal_data.grp -baq CALCULATE_AS_NECESSARY -o exome.realigned.dedup.recal.bam