VCF

VCF stands for Variant Call Format, and it is used by the 1000 Genomes project to encode structural genetic variants.

Required Extensions: .vcf, .vcf.gz

If the file is gzipped (ends with .vcf.gz), it must have an accompanying Tabix index (see below).

VCF Specification

The version 4.0 spec: http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0

Example V.4.0 File:

##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
20 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3

 

This example shows in order:

  • A good, simple SNP
  • A possible SNP that has been filtered out because its quality is below 10
  • A site at which two alternate alleles are called, with one of them (T) being ancestral (possibly a reference sequencing error)
  • A site that is called monomorphic reference (i.e., with no alternate alleles),
  • A microsatellite with two alternative alleles, one a deletion of 3 bases (TCT), and the other an insertion of one base (A).

Genotype data are given for three samples, two of which are phased and the third unphased, with per sample genotype quality, depth, and haplotype qualities (the latter only for the phased samples) given as well as the genotypes. The microsatellite calls are unphased.

VCF Requirements

IGV supports VCF Version 4.

VCF data files must be indexed for viewing in IGV, either by using igvtools or by using Tabix. 

  • igvtools can be run from the command line or IGV itself (Tools>Run igvtools...)  After launching, choose the Index command and browse to your .vcf file. The index file (.idx) will be created in the same directory as the .vcf file. 
  • Tabix creates a .tbi file.  Tabix, including documentation, is available from the SamTools Web site.