I've notices on some occasions that the .vcf.ind file that is created alongside the vcf is older than the vcf itself (not by much, a second or so). I've seen this happening in small (highly scattered) jobs where there are many (~13K) samples. Perhaps the last VCF line is so long that it takes longer to flush the vcf buffer than it takes to write the index...I don't know. At any rate, the result is a stale VCF index which slows subsequent operations down (as the index needs to be rebuild), and since further operations may be performed by a less-poweful user, the GATK may be forced to use in-memory index.
an example (for broadies) can be seen here: /seq/dax/t2d_genes/v3/scatter/temp_0001_of_2000/t2d_genes.unfiltered.vcf* (not for long!)
Perhaps a small delay or test can be introduced to verify that the vcf-file is really closed before closing the index file.
Hello, I am a first-time user of GATK and have spent some time now on trying to get the input bam files in the appropriate format. To run IndelRealigner, I have added ReadGroups, Reordered and Index my bam file with the respective Picard-Tools.
My command-line is the following:
java -Djava.io.tmpdir='pwd'/tmp -jar GenomeAnalysisTK.jar -I ./add_read_groups_reorder_index.bam -R ./genome.fa -T IndelRealigner -targetIntervals ./gatk.intervals -o ./*.bam -known ./Mills-1000G-indels.vcf --consensusDeterminationModel KNOWNS_ONLY -LOD 0.4
I get the following message:
SAM/BAM file /home/gp53/tophat2-merge-ctl-1st-2nd-readgroups-reorder-index.bam is malformed: SAM file doesn't have any read groups defined in the header.
My reads are paired-end aligned with TopHat2 I will appreciate your help on this. Thanks, G.
Hi I'm getting an error "An index is required, but none found" for files from the GATK resource bundle. In this case hapmap_3.3.b37.vcf.gz from your bundle has hapmap_3.3.b37.vcf.idx.gz in the same directory. I suspect this is because my input VCF file is tabix indexed something in GATK has decided that everything else must be. Is there any way to override this? I'm guessing it might be something to do with tribbles? But I can't seem to find the relevant bit of documentation that explains this. Thanks, martin
I've been using GATK for a while now and suddenly when i run the unified genotyper (GATK version 1.5.21) my .vcf.idx files are empty. So the index file is produced but without any content.
This is how i run the command:
java -Xmx8g -jar $GATK_HOME/GenomeAnalysisTK.jar \ -T UnifiedGenotyper\ -I $SAMPLE\.sort-dup-realign-fixed-gatkrecal.bam\ -R $REFERENCE\ -o $SAMPLE\.raw.vcf\ --annotation QualByDepth\ --annotation HaplotypeScore\ --annotation MappingQualityRankSumTest\ --annotation ReadPosRankSumTest\ --annotation FisherStrand\ --annotation RMSMappingQuality\ --annotation DepthOfCoverage\ --genotype_likelihoods_model BOTH\ -baq CALCULATE_AS_NECESSARY\ --standard_min_confidence_threshold_for_calling 50.0\ --standard_min_confidence_threshold_for_emitting 10.0\ --min_base_quality_score 20\ -l INFO\ --dbsnp $DBSNP\ -metrics $SAMPLE\.metrics\ -L $TARGET_INTERVALS