In regards to Intersecting vcf files
Posted in Ask the GATK team | Last updated on 2013-03-04 13:58:54


Comments (1)

Hi all,

I would appreciate your thoughts on the following pipeline:
I'm currently working on a number of WGS of non-human vertebrates. My approach for calling variants is to maximize the sensitivity of the calls by using two callers (GATK's UnifiedGenotyper + samtools' mpileup) per chromosome regardless of / ingnoring all filters. Next, I would like to merge (not intersect) the two vcf files (GATK+samtools) per each chromosome, then merge (not intersect) all the vcf files pertaining to all chromosomes in order to retrieve a final vcf dataset per individual:

For merging the GATK and samtools:

$ java -Xmx10g -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fasta 
--variant:GATK chr#.GATK.vcf --variant:samtools chr#.samtools.vcf 
-o chr#.GATK_samtools.union.vcf 
-genotypeMergeOptions PRIORITIZE -priority GATK,samtools --filteredrecordsmergetype KEEP_UNCONDITIONAL

For merging all chromosomes per individual:

$ java -Xmx10g -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fasta 
--variant:chr1 chr1.GATK_samtools.union.vcf --variant:chr2 chr2.GATK_samtools.union.vcf --variant:chr3 chr3.GATK_samtools.union.vcf 
-o Individual1.union.vcf 
-genotypeMergeOptions PRIORITIZE -priority chr1,chr2,chr3 --filteredrecordsmergetype KEEP_UNCONDITIONAL

Finally I would like to intersect between two individuals and keep only the variants that are common to both individuals:

Uniting / merging two individuals:

$ java -Xmx10g -jar GenomeAnalysisTK.jar -T CombineVariants -R ref.fasta 
--variant:individual1 Individual1.union.vcf --variant:Individual2 Individual2.union.vcf -o Individual1_2.union.vcf 
-genotypeMergeOptions PRIORITIZE -priority Indiviual1,Individual2 --filteredrecordsmergetype KEEP_UNCONDITIONAL

Intersecting the two indiviuals in order to keep only common variants:

$  java -Xmx10g -jar GenomeAnalysisTK.jar -T SelectVariants -R ref.fasta 
--variant Individual1_2.union.vcf -select 'set == "Intersection";' 
-o Intersected.vcf

Am I doing this right? I'm afraid I may be losing variants or something else along this pipeline. Remember that I want to keep only the common variants while ignoring the filters in order to increase sensitivity as much as possible.

Thanks!

Sagi


Return to top Comment on this article in the forum