VCF

VCF stands for Variant Call Format, and it is used by the 1000 Genomes project to encode structural genetic variants.

Required Extensions: .vcf, .vcf.gz

If the file is gzipped (ends with .vcf.gz), it must have an accompanying Tabix index (see below).

VCF Specification

The version 4.0 spec: http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0

Example V.4.0 File:

This example shows in order:

  • A good, simple SNP
  • A possible SNP that has been filtered out because its quality is below 10
  • A site at which two alternate alleles are called, with one of them (T) being ancestral (possibly a reference sequencing error)
  • A site that is called monomorphic reference (i.e., with no alternate alleles),
  • A microsatellite with two alternative alleles, one a deletion of 3 bases (TCT), and the other an insertion of one base (A).

Genotype data are given for three samples, two of which are phased and the third unphased, with per sample genotype quality, depth, and haplotype qualities (the latter only for the phased samples) given as well as the genotypes. The microsatellite calls are unphased.

VCF Requirements

IGV supports VCF Version 4.

VCF data files must be indexed for viewing in IGV, either by using igvtools or by using Tabix. 

  • igvtools can be run from the command line or IGV itself (File>Run igvtools...)  After launching, choose the Index command and browse to your .vcf file. The index file (.idx) will be created in the same directory as the .vcf file. 
  • Tabix creates a .tbi file.  Tabix, including documentation, is available from the SamTools Web site.