Getting a Large VCF File output on Variant call
Posted in Ask the team | Last updated on 2012-11-08 18:05:56


Comments (1)

HI

I am using the following set of commands on GATK2.1.13 to generate a VCF file

echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T RealignerTargetCreator  -o my.intervals -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Realignment Done at `date`"
echo "Starting IndelRealigner at `date`"

echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T IndelRealigner -targetIntervals my.intervals -o myrealignedBam.bam  -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Realignment done at `date`"
echo "Starting UnifiedGenotyper at `date`"
echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -l INFO -R human_g1k_v37.fasta -T UnifiedGenotyper    -I myrealignedBam.bam    -o mygatk_vcf.vcf    --output_mode EMIT_ALL_SITES -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Gentoypxing complete at `date`"

When i do a 'mpileup' for B2_with_ReadGroup.ddup.sorted.bam , I get a devcent 10 MB VCF file. But on the last ste of the above pipeline, my " mygatk_vcf.vcf " is goinging into 81GBs !!

Do you know what is wrong ?


Return to top Comment on this article in the forum