Running igvtools from the Command Line

Downloading igvtools

The igvtools utilities can be downloaded from the Downloads page on the IGV Web site.

          igvtools_<version #>.zip includes the jar file and shell scripts for running igvtools, as well as the genome files.
          igvtools_nogenomes_<version #>.zip includes the jar file and shell scripts and shell scripts for running igvtools.

Starting with shell scripts

The igvtools utilities can be invoked, with or without the graphical user interface (GUI), from one of the following scripts:

   igvtools (command-line version for linux and  Mac OS 10.x)
   igvtools_gui (gui version for linux and  Mac OS 10.x)

   igvtools.bat (command-line version for windows)
   igvtools_gui.bat (gui version for windows)

The general form of the command-line version is:

   igvtools [command] [options][arguments]
or
   igvtools.bat [command] [options][arguments]

Recognized commands, options, arguments, and file types are described below.

Starting with java

Igvtools can also be started directly using Java.  This option allows more control over Java parameters, such as the maximum memory to allocate.  In this example, igvtools is started with 1500 MB of memory allocated:

   java -Xmx1500m  -Djava.awt.headless=true -jar igvtools.jar [command] [options][arguments]

To start with a GUI the command is

   java -Xmx1500m  -jar igvtools.jar -g

Memory settings

The scripts above allocate a fixed amount of memory.  If this amount is not available on your platform you will get an error along the lines of "Could not start the Virtual Machine".   If this happens you will need to edit the scripts to reduce the amount of memory requested,  or use the Java startup option.  The memory is set via a "-Xmx" parameter. For example  -Xmx1500m requests 1500 MB,  -Xmx1g requests 1 gigabyte.

Genome

The genome argument in the tile and count command can be either an id, or a full path to an IGV .genome file.  The id for IGV-supplied genomes are listed below.  Genome definitions corresponding to these files are in the genomes subdirectory of the igvtools install.  The id is derived by removing the .extension from the filename.

Current genome list: hg18, 1kg_ref, hg19, hg17, hg16, mm9, mm8, mm7, mm6, mm5, canFam2, btaurus_3.0, galGal3, cavPor3, Plasmodium_3D7_v2.1, Plasmodium_3D7_v5.5, Plasmodium_6.1, sacCer1, spombe_709, spombe_1.55, zebrafish, ce6, ce4, dm3, dm2, dmel_5.9, Aplysia, tcas_2.0, tcas_3.0, ncrassa_v3, Glamblia_2.0, me49, tair8, tair9 O_Sativa_r6, ppatens_1.2.  

Note:  Other genomes might be available; check the genomes directory in the igvtools installation folder.  The id of the genome can be inferred by removing ".genome" from the file name.

Commands

Tile

The tile command converts a sorted data input file to a binary tiled data (.tdf) file. Use this command to pre-process large datasets for improved IGV performance. 

Supported input file formats are: .wig, .cn, .snp, .igv, and .gct.

Usage:

          igvtools tile [options]  [inputFile] [outputFile] [genome]

Required arguments:

          inputFile    The input file (see supported formats below).

          outputFile   Binary output file.  Must end in ".tdf".

          genome      A genome id or filename. See details below. Default is hg18.

Options:

  -z num  Specifies the maximum zoom level to precompute. The default
               value is 7 and is sufficient for most files. To reduce file
               size at the expense of IGV performance this value can be
               reduced.

  -f  list     A comma delimited list specifying window functions to use
               when reducing the data to precomputed tiles.   Possible
               values are min, max, and mean.  By default only the mean
               is calculated.

  -p file    Specifies a "bed" file to be used to map probe identifiers
               to locations.  This option is useful when preprocessing . gct
               files.  The bed file should contain 4 columns:
                           chr start end name
               where name is the probe name in the .gct file.

Example:

          igvtools tile -z 5  copyNumberFile.cn copyNumberFile.tdf hg18

Notes:

Data file formats, with the exception of .gct files, must be sorted by start position.  Files can be sorted with the sort command described below.  Attempting to preprocess an unsorted file will result in an error.

Count

The count command computes average feature density over a specified window size across the genome. Common usages include computing coverage for alignment files and counting hits in Chip-seq experiments. By default, the resulting file will be displayed as a bar chart when loaded into IGV.

Supported input file formats are: .sam, .bam, .aligned, .psl, .pslx, and .bed.

Usage:

          igvtools count [options] [inputFile] [outputFile] [genome]

Required arguments:

          inputFile    The input file (see supported formats above).

          outputFile   Binary output file.  Must end in ".tdf" or ".wig".  To indicate that you want to
                             output both a .tdf and a .wig file, list both output filenames as a single string,
                             separated by a comma with no other delimiters.  To display feature
                             intensity in IGV, the density must be computed with this command, and the
                             resulting file must be named <feature track filename>.tdf.

          genome      A genome id or filename. See details below. Default is hg18.

Options:

 
-z num  Specifies the maximum zoom level to precompute. The default
               value is 7 and is sufficient for most files. To reduce file
               size at the expense of IGV performance this value can be
               reduced.

  -w num  The window size over which coverage is averaged. Defaults to 25 bp.
              

  -e num  The read or feature is extended by the specified distance
               in bp prior to counting. This option is useful for chip-seq
               and rna-seq applications. The value is generally set to the
               average fragment length of the library.

  -f  list    A comma delimited list specifying window functions to use
               when reducing the data to precomputed tiles.   Possible
               values are  min, max, and mean.  By default only the mean
               is calculated.

Notes:

The input file must be sorted by start position. See the sort command below.

Example:
          igvtools count -z 5 -w 25 -e 250 alignments.bam  alignments.cov.tdf  hg18

Index

Creates an index for an alignment or  feature file.  Index files are required for loading alignment files into IGV, and can significantly improve performance for large feature files.  Note that the index file is not directly loaded into IGV. Rather, IGV looks for the index file when the alignment or feature file is loaded.  This command does not take an output file argument. Instead, the filename is generated by appending ".sai" (for alignments) or ".idx" (for features) to the input filename as IGV relies on this naming convention to find the index . The input file must be sorted by start position (see sort command, below). 

Supported input file formats are: .sam, .aligned, .vcf, .psl, and .bed.

NOTE: This command will not index a binary (BAM) file.  Use the samtools package to sort and index BAM files.

Usage:

  igvtools index [inputFile]

Sort

Sorts the input file by start position, as required.

Supported input file formats are: .cn, .igv, .sam, .aligned, .psl, .bed, and .vcf.

NOTE:  This command does not work with BAM files.  The samtools package can be used to sort .bam files.

Usage:

          igvtools  sort [options] [inputFile]  [outputFile]

Required arguments:

          inputFile 

          outputFile 

Options:

  -t tmpdir  Specify a temporary working directory.  For large input files
             this directory will be used to store intermediate results of
             the sort. The default is the users temp directory.

  -m maxRecords  The maximum number of records to keep in memory during the
             sort.  The default value is 500000.  Increase this number
             if you receive "too many open files" errors.   Decrease it
             if you experience "out of memory" errors.

Formatexp

Formats GCT or RES files for display. This should only be used if the file has not previously been log-transformed and has no negative numbers. The module:

  1. Takes the log2 of the data.
  2. Computes the median and subtracts it from each log2 probe value (i.e., centers on the median).
  3. Computer the MAD (mean absolute deviation) using the definition here: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/mad.html
  4. Divides each log2 probe value by the MAD.

Supported input file formats are: .gct and .res.

Usage:

          igvtools formatexp [inputFile]  [outputFile]

Required arguments:

          inputFile 

          outputFile 

Version

  Prints the igvtools version number to the console.