This guide describes the Integrative Genomics Viewer (IGV).
The following figure shows data from The Cancer Genome Atlas:
The red box on the chromosome ideogram indicates which portion of the chromosome is displayed. When zoomed out to display the full chromosome, the red box disappears from the ideogram.
|The ruler reflects the visible portion of the chromosome. The tick marks indicate chromosome locations. The span lists the number of bases currently displayed.|
IGV displays data in horizontal rows called tracks. Typically, each track represents one sample or experiment. This example shows methylation, gene expression, copy number, LOH, and mutation data.
|IGV also displays features, such as genes, in tracks. By default, IGV displays data in one panel and features in another, as shown here. Drag-and-drop a track name to move a track from one panel to another. Combine data and feature panels by selecting that option on the General tab of the Preferences window.|
|Track names are listed in the far left panel. Legibility of the names depends on the height of the tracks; i.e., the smaller the track the less legible the name.|
|Attribute names are listed at the top of the attribute panel. Colored blocks represent attribute values, where each unique value is assigned a unique color. Hover over a colored block to see the attribute value. Click an attribute name to sort tracks based on that attribute value.|
|File||Load from File||Displays genomic data from one or more files. more...|
|Load from URL||Displays genomic data from a file identified by URL. more...|
|Load from Server||Displays genomic data from the IGV data server. more...|
|Load from DAS||Displays genomic data from a Distributed Annotation Server (DAS). more...|
|New Session||Unloads all currently loaded data, as if you exited and restarted IGV. more...|
|Open Session||Opens a previously saved session file. more...|
|Save Session||Saves your current settings to a named session file. more...|
|Save Image||Saves a snapshot of the IGV window to a graphics file, omitting the menu bar and tool bar. Can save in .png, .jpg, or .svg format. New in 2.3.26: One can also save as .eps, by installing the EPSGraphics library|
|Genomes||Load Genome from File||Loads a genome into IGV from your file system.|
|Load Genome from URL||Loads a genome into IGV from a web URL.|
|Load Genome from Server||Loads a genome into IGV from the IGV data server. more...|
|Create .genome File||Save your genome file in .genome format.|
|Manage Genome List||Choose which genomes to display in the genome drop-down menu in the tool bar.|
|View||Preferences||Opens a tabbed menu of data display preferences. more...|
|Color Legends||Displays color legends for track data, which may be modified. more...|
|Show Name Panel||Shows/hides the track name panel.|
|Set Name Panel Width||Resets the track name panel width.|
|Show Attribute Display||Shows/hides attributes and attribute values. more...|
|Select Attributes to Show||Shows/hides selected attributes and attribute values. more...|
|Show Header Panel||Shows/hides the chromosome location header panel.|
|Reorder Panels||Allows the user to reorder the display panels.|
|Go to||View and select loci visited in your navigation history.|
|Tracks||Sort Tracks||Sorts track data. more...|
|Group Tracks||Groups track data. more...|
|Filter Tracks||Filters track data. more...|
|Fit Data to Window||Sets the track height to display all of the data, or as much data as possible. more...|
|Set Track Height||Sets the track height to a specified value. more...|
|Regions||Region Navigator||Opens the region navigator. more...|
|Gene Lists||Opens the gene lists window. more...|
Saves currently defined regions of interest to a BED file. If no regions of interest are defined, no BED file is created. more...
|Import Regions||Imports regions of interest from a BED file. more...|
|Tools||Run Batch Script||Executes a series of sequential tasks. Users can load at .txt file that contains a list of commands, one per line, that will be run by IGV. The accepted commands are the same as the IGV Port Commands.|
|Run igvtools||Launches the igvtools interface window. more...|
|Find Motif||Search for a particular nucleotide sequence in the reference genome. more...|
|Gitools Heatmaps||Data and results are represented as browsable heatmaps. Data can be exported from IGV in gitools format, or loaded directly into a running gitools session. See http://www.gitools.org for details.|
A fast, flexible suite of tools used to compare sets of genomic features. IGV uses BEDTools to compare features in loaded tracks, and view the results as a new track. See more details under Third Party Tools Integration.
|Combine Data Tracks||Combine two existing numeric tracks to dynamically create a new track. Operators include add, subtract, multiply, and divide. For example, when multiplying two tracks, for a locus each with data values of 10 and 2 in the separate tracks will have a value of 20 in the new track.|
Load File from GenomeSpace
|Load a file into IGV from your GenomeSpace directory. more...|
|Load Genome from GenomeSpace||Load a genome into IGV from your GenomeSpace directory. more...|
|Save Session to GenomeSpace||Save current IGV session to your GenomeSpace directory. more...|
|Load Session from GenomeSpace||Load a previous session from your GenomeSpace directory. more...|
|Logout||Log out of GenomeSpace|
|Register||Register a new account at GenomeSpace|
|Help||User Guide...||Displays the IGV User Guide.|
|Help Forum...||In your default web browser, opens the home page for the igv-help forum.|
|About IGV||Displays IGV version and build number.|
Genome drop-down box
|Loads a genome. more...|
|Chromosome drop-down box
||Zooms to a chromosome. more...|
||Displays the chromosome location being shown. To scroll to a different location, enter the gene name, locus, or track name and click Go. more...|
|Whole genome view
||Zooms to whole genome view. more...|
|Moves backward and forward through views of the genome like the back and forward buttons in a web browser.|
||Refreshes the display.|
|Define a region
||Defines a region of interest on the chromosome. more...|
|Reduces the row height on all tracks to fit all data for the region in view into the window; will also expand tracks (to their maximum preferred size) to fill the view, if needed.|
|Toggles the pop-up information windows in IGV on or off.|
||Zooms in and out on a chromosome. Sometimes referred to as the "railroad track." more...|
To select tracks and display the pop-up menu, do one of the following:
Commands in the track pop-up menu change the display options for the selected tracks. Most changes made via the pop-up menu are lost when you exit IGV unless you save the session. In a few cases, changing the pop-up menu also changes an option in the Preferences window; these changes are persistent.
The type of data displayed in the selected tracks determines which commands appear in the pop-up menu. This page lists commands by track type: data track, feature track, and alignment track. Use your browser's search function to find a particular command.
Data tracks display numeric values. For an example, click File>Load from Server and select The Cancer Genome Atlas.GBM.Expression.GBM Batch 1-8 Centered and Normalized (hg18). The following commands appear in the pop-up menu for data tracks:
|Rename Track||Renames a track. more...|
|Change Track Color (Positive/Negative Values)||Changes the track color for selected tracks. more...|
|Change Track Height||Changes the track height for selected tracks. more...|
|Change Font Size||Changes the font size for selected tracks.|
Type of Graph
|Changes the way IGV displays track data. more...|
Changes the value represented by each pixel of track data.
At all but the lowest zoom levels, each pixel represents a significant amount of data. IGV divides the data to be displayed into "windows" of equal length each corresponding to a single pixel, summarizes the values across each window, and then displays the summarized values in the track. Select the function IGV will use to summarize the values.
|Set Data Range||Changes the minimum, baseline, and maximum values of the graph used to display track data. more...|
|Set Heatmap Scale||Changes the data range and color of the heatmaps used to display track data. more...|
|Log scale||Plots the chart for that track on a log scale.|
|Autoscale||Toggles the autoscaling function for a given track. With autoscaling enabled, IGV automatically adjusts the plot Y scale to the data range currently in view. As the user pans and moves, this scaling continually adjusts.|
|Show Data Range||Toggles whether the numeric range of the values in the view for a given track is displayed; works for charts other than heatmaps.|
|Create Overlay Track||Merge the selected tracks so that they are displayed on top of one another.|
|Separate Tracks||Only enabled for overlaid tracks. Restores them to separate tracks.|
|Remove Track(s)||Removes selected track(s) from the display. more...|
|Save image||Saves the data visible in the IGV panel to a PNG file.|
Feature tracks identify genomic features. For an example, see the Gene track, which IGV loads when you select a genome. The following commands appear in the pop-up menu for feature tracks:
|Rename Track||Renames a track. more...|
|Change Track Color||Changes the track color for selected tracks. more...|
|Change Track Height||Changes the track height for selected tracks. more...|
|Change Font Size||Changes the font size of the feature labels.|
Displays overlapping features, such as different transcripts of a gene, on one line or multiple lines or condensed (squished). more...
|Copy Details to Clipboard||Copies the pop-up text for the selected feature to the system clipboard so that you can paste the information into other applications.|
|Copy Sequence||Copies the sequence of the selected feature to the system clipboard so that you can paste the information into other applications.|
|Set Feature Visibility Window||Specifies the threshold, in kilobases, for IGV to display features in the window. In other words, if you set this at 50 kb, IGV will only display features after you have zoomed in to display 50 kb or less in the IGV window.|
|Remove Track(s)||Removes selected track(s) from the display. more...|
|Save image||Saves the data visible in the IGV panel to a PNG file.|
Alignment tracks display alignments (more here). For an example, select the Human hg19 genome from the genome dropdown menu in the toolbar, and then click File>Load from Server and select an alignment from the 1000 Genomes project. Tip: Zoom in to view alignments and the alignment track pop-up menu.
|Rename track||Renames a track.|
|Copy read details to clipboard||When you hover over a read, the tool tip displays information about the read. This option copies that information and the read sequence to the clipboard.|
|Group alignments||Groups alignments by read strand, first-in-pair strand, sample, read group, chromosome of mate, pair orientation, supplementary flag, or tag.|
Sorts alignments by start location, strand, base, mapping quality, sample, read group, or insert size as defined in the SAM/BAM file format and detailed below in Color alignments.
When sorting by base, alignments which span a base with a splice, i.e. do not actually cover the base, now appear at the bottom.
Additionally, repeat the most recent sort with hotkey ctrl-s.
Colors alignments by the following options:
|Shade base by quality||Uses the color intensity of a mismatched base to indicate its quality score: the darker the shading the higher the score. Changing this option also changes the option on the Alignments tab of the Preferences window.|
|Show mismatched bases||By default, mismatched bases are displayed as colored letters on a gray bar that represents the read. To change the default color scheme, see Modify the prefs.properties file.|
|Show all bases||Select this option to display all bases in the read. To change the default color scheme, see Modify the prefs.properties file.|
|View as pairs||For more information on this option, see this page.|
|Go to mate||Jumps to the region of the paired read (if any).|
|View mate region in split screen||
For more information on this option, see this page.
|Set insert size options||Controls color-coding of paired reads based on the inferred insert size.|
|Re-pack alignments||Sorts alignments to minimize gaps at the top of the track.|
|Show coverage track||When selected, IGV displays the matching coverage track for the alignment track.|
|Load coverage data||
Loading an alignment track from the IGV data server (File > Load from Server) automatically loads the matching coverage data.
|Changes the height of the reads to adjust the amount of information displayed.|
|Select by name||Opens a window so you can enter the name of a read. IGV will highlight that read with a colored border. Note that IGV does not change the view, so if the read is not currently visible this option will have no apparent effect.|
Clears the outlines that show paired reads.
|Copy read sequence||Copies the nucleotide sequence of the selected read orregion of interest to the clipboard.|
|BLAT read sequence|
|Copy consensus sequence||
Calculates the concensus sequence for the region in view and copies the information to the clipboard. The method for calculating the consensus is taken from Cavener, Nucleic Acids Res. 15, 1353-1361, 1987.
1. If the frequency of a single nucleotide at a specific position is greater than 50% and greater than twice the number of the second most frequent nucleotide it is assigned as the consensus nucleotide.
2. If the sum of the frequencies of two nucleotides is greater than 75% (but neither meet the criteria for a single nucleotide assignment) they are assigned as co-consensus nucleotides.
3. If no single nucleotide or pair of nucleotides meets the criteria, assign an 'N'.
Information copied to the clipboard includes:
|Sashimi Plot||Open a Sashimi-style plot window. more...|
|Remove track||Removes selected tracks from the display. more...|
|Save image||Saves the data visible in the IGV panel to an image file|
To display the Preferences window, click View>Preferences. Preferences are preserved across sessions. To override preferences during a session, use the track display pop-up menu. Each section on this page describes the options on a tab of the Preferences window: General, Tracks, Mutations, Charts, Alignments, Probes, Proxy, Advanced, and IonTorrent.
To restore default preferences or modify other default settings not listed here, see the Modify the prefs.properties file page.
|Select to distinguish regions with zero values (white) from regions with missing data (gray). Clear (default) to display both regions in the same way (white). Affects only bar charts and scatter plots.|
|Select to display all tracks in a single panel. Clear (default) to display data tracks (e.g., expression data) in one panel and feature tracks (e.g., genes) in another.|
|Select (default) to show attributes and attribute values to the left of the data panel. Clear to hide the attributes. This option and View>Show Attribute Display have the same effect on attribute display.|
|Select to outline the boundaries of regions of interest in black. Clear (default) to leave them without black boundaries.|
|Zoom in on search results. When selected (default) the zoom level is automatically adjusted so that the target feature fills the view after a successful search. If not checked, the target feature of a search is centered in the view but the zoom level is unaffected.|
|Change this to change the resolution (in base pairs) at which the sequence track becomes visible.|
Change this to define how large a flanking region in base pairs. To specify the flanking region as a percentage of feature length, enter the percentage as a negative number. IGV adds the flank before and after a feature locus when you zoom to a feature, or when you view gene/loci lists in multiple panels.
|Click here to change the background color of the IGV display.|
|Use this to set a default font size for labeling tracks and features.|
|Default track height for bar charts, scatter plots, and line plots.|
|Default track height for all other tracks.|
Name of an attribute in the sample information file. IGV uses the corresponding attribute value as the track name.
Select to expand feature tracks by default. You may have to restart IGV for this to take effect.
Select (default) to show the "expand/collapse" triangular icon on feature tracks.
Collapsed with the icon: Expanded with the icon:
Select to normalize tracks containing coverage data in .tdf files that were created using igvtools. This normalization option multiplies each value by [1,000,000 / (totalReadCount)].
|Select to overlay mutation data on other tracks. more...|
|Name of an attribute column in the sample information file. IGV uses the corresponding attribute values to "link" mutation tracks with other tracks. more...|
|Select this to show mutation tracks that are not linked to other tracks, and therefore will not be seen if the overlay option is checked. This option has no affect if the overlay option is not selected.|
Select to color-code mutation data. To view and change the mutation coloring scheme, click the Choose Colors button. The Edit button next to Mutation in View>Color Legends also displays the same dialog. Alternatively, set default mutation coloring by modifying the prefs.properties file.
Mutation color codes are assigned dynamically. An example assignment chart is shown (Screenshot 2015.02.18).
|Select to add a border at the top to the track.|
|Select to add a border at the bottom of the track.|
Select (default) to color the top and bottom borders (if any). Clear to show the borders in black regardless of the track color. Tip: To change the track color, use the track display pop-up menu.
|Select to label the track with its name, provided the track is at least 25 pixels high.|
|Select to label the y-axis with its data range.|
|Select (default) to allow charts (barchart, scatterplot, and lineplot) to automatically adjust the plot Y scale to the data range currently in view. As the user pans and moves, this scaling continually adjusts. Clear to turn autoscaling off. There is an option in the popup menu to enable autoscaling for a single track.|
Select (default) to show the range of the data.
|Select to show all features in heatmaps.|
The following figures illustrate these track display options.
|Sets the threshold at which IGV displays reads. Reads are visible only when IGV is zoomed in to display a number of bases less than or equal to this threshold.|
IGV displays a specified number of randomly sampled alignments configured by the downsampling parameters instead of keeping all of them in memory. The coverage track, displaying total coverage at a region, is unaffected; that is, it always shows unsampled values. Default setting downsamples up to 100 per 50 nt window and paired reads are downsampled as a set.
Downsampled regions are marked by black rectangles spanning the downsampled region just under the coverage track as shown in the screenshot below. When zoomed out, the black rectangles may appear like a continuous black line.
Filter and Shading Options
For more information, see the Sequence Alignment/Map Format Specification.
Phred quality scores are logarithmically linked to error probabilities with typical ranges between Q2 and Q63. For example, Q10–Q15 correspond to 10%–3% base call error probabilities (cutoff range recommended by Trimmomatic), Q20 to 1% error probability or 99% base call accuracy probability (conservative cutoff recommended by GATK), and Q60 corresponds to a one in a million error probability or 99.9999% base call accuracy probability. Differences in Q for the Sanger versus Solexa/Illumina GA platforms are graphed in this Wikipedia entry that shows diverging probabilities for scores less than Q13 for the different platforms. The same entry discusses Phred+33 versus Phred+64 systems of scoring, with the former being more prevalent for recent platforms.
Splice Junction Track Options
|Sets default size thresholds for color-coded flagging of paired end alignments. Only paired end alignments with insert sizes between these thresholds are flagged. Select Compute to compute selected values from the actual size distribution of each library.|
Select an option to determine how IGV places expression data on the genome:
Modifying this option affects the display of subsequently loaded alignment tracks.
|Sets proxy parameters for connecting to the Internet. IGV will use this to load hosted genomes and hosted data sets.|
|Select and enter values if a username and password is required for the proxy.|
|Clears all proxy settings.|
|Select this option to enable a port on which IGV listens for commands and http requests. Enabling the port allows control of IGV from a web browser. more...|
Select this option to edit URLs for the IGV data and genome servers. These settings are rarely changed.
IGV caches each genome that it loads. On rare occasions, it may be necessary to clear the cached genome file to display an updated version of the genome. Click Clear Genome Cache to do this.
Genome Server URL is the URL for the genome server that populates the genome drop-down list.
Data Registry URL is the URL for the hosted data sets registry (populates File>Load from Server dialog).
Keep this selected to allow IGV to automatically check for updated genomes. Clear to disable this automatic check.
IGV 2.2, released December 18, 2012, and newer versions allow disabling anti-aliasing. This can significantly improve performance in some circumstances with running with X-Windows.
|IGV 2.3.46, released March 2015 allows BLAT sequence searches of features, aligned reads, and selected regions of interest of up to 8 kb in length via right-click pop-up menu. Change the server hosting the genome against which BLAT searches. The default is the BLAT server hosted by UCSC's Genome Browser. Most UCSC derived genomes are supported, including human and mouse genomes.|
|This allows you to move your default IGV directory.|
|Flow signal distribution chart options|
By default, IGV uses heatmaps to display certain types of data (see Default Display). Use the Color Legends window to change the default colors for these heatmaps.
To change the default colors:
Alternatively, set default mutation data colors by modifying the prefs.properties file.
There are some useful keyboard shortcuts you can use in IGV.
|ctrl-R||Defines the region currently in view as a region of interest.|
|ctrl-F/ctrl-B||Skip forward to the next feature and back to the last feature.|
|ctrl-shift-F/ctrl-shift-B||If you have the feature track expanded and have selected one of the rows, this will skip forward to the next exon or back to the last exon.|
|These move you back and forward through your IGV history.|
|Arrow keys||Pans left, right, up, and down in the current chromosome.|
|Home/End keys||Skips to the page top or bottom of the current view, then pages right or left respectively.|
|PageUp/PageDown keys||Pages up and down the current view.|
You can start IGV from either the:
To start IGV from using Java Web Start:
To start IGV from the command line:
You can optionally specify either a session file or a comma-delimited list of files to load, and a locus in the form of a locus string (e.g. chr1:100-200) or gene name. These 2 arguments are order dependent, you cannot specify a locus without specifying a file to load. Other arguments that you can use from the command line include:
Zoom out to view the whole genome, zoom in to a chromosome and continue zooming to base pair resolution. As you zoom in, the gene track shows gene names and sequence data. If the sequence data is unavailable, small blocks replace the bases. If you are using a genome stored on the IGV genome server, you must be connected to the internet to view the sequence data. (Note that the zoom slider, also sometimes called the "railroad track," does not appear when you are viewing the full genome; it reappears when you zoom in to the chromosome level.)
Click the whole genome view icon to zoom out to the genome view.
From the genome view, zoom to a chromosome by clicking its label.
Select a chromosome from the drop-down menu to zoom to it.
|Click and drag on the genome ruler to sweep over and select an area to which to zoom.|
To zoom in and out on a chromosome:
|Zoom in||Zoom out|
|Double-click or shift-click the track data||Alt-click (Mac: option-click) the track data|
|Click a zoom level on the zoom slider||Click a zoom level on the zoom slider|
|Click the plus (+) icon on the zoom slider||Click the minus (-) icon on the zoom slider|
Click and drag on the genome ruler to select an
area to which to zoom
To scroll the display:
|Vertical scroll||Horizontal scroll*|
|Scroll bar in the IGV window||Click and drag the track data|
|Click and drag the track data||Click the chromosome ideogram to scroll to that location|
|Page Up and Page Down keys||Click the ruler to center that location|
|Up and down arrow keys||Left and right arrow keys|
|Home and End keys (scroll by screen width)|
* You cannot scroll horizontally when IGV is displaying the whole genome or a whole chromosome.
Use the search box to locate:
Note: When loading from a file with an index, search may not find all matches. This is because IGV does not keep the entire file contents in memory when an index is present.
If you have a feature track loaded (e.g., Gene track, BED, or GFF file), you can jump from one feature to the next.
IGV positions the start of the next (or previous) feature at the center of the display.
You can also jump from one exon to the next. To exon-jump, select a feature track and press Shift+Ctrl+F to center the next exon in your view, Shift+Ctrl+B to move back one exon.
The back and forward buttons in the toolbar () allow you to move backward and forward through your views of the genome the way you move back and forward in a web browser.
Genomes are selected from the genome drop-down list. Intially, this list contains a single item, Human hg18. To add additional genomes see the section below on "Selecting a Hosted Genome."
IGV provides a number of genomes that are hosted on a server at the Broad Institute. Initially, the genome drop-down list contains a single item, "Human hg18".
If the genome you need is not available, either post a request at http://groups.google.com/group/igv-help that it be added, or follow the instructions below to load or import it.
This option supports defining a reference genome by loading either an IGV .genome file or a FASTA file. The .genome file is created as described below. FASTA files must be plain text (not gzipped), and must be indexed with a .fai as defined by the Samtools suite (http://sourceforge.net/projects/samtools/). If the file is not indexed, IGV will attempt to index it. IGV remembers the location of the FASTA file and the file will appear in the drop-down list until it is removed as described above (Genomes>Manage Genome List).
This option enables additional files to be associated with the FASTA reference sequence file, as described below. These files are archived in a zip with with a .genome extension. This option also allows the reference sequence to be defined as a directory of FASTA files, rather than a single FASTA.
Note: If you are choosing files from the NCBI directory, you will generally want to use the .fna or .ffn file (nucleic acid sequences), as opposed to the .faa (amino acids). Choose the .gff file for the annotation file.
To remove an imported genome:
IGV can optionally listen for http requests over a port. This option is turned off by default but can be enabled from the Advanced tab of the Preferences window.
Note: IGV will write a response back to the port socket upon completion of each command. It is good practice to read this response before sending the next command. Failure to do so can overflow the socket buffer and cause IGV to freeze. See the example below for the recommended pattern.
|new||Create a new session. Unloads all tracks except the default genome annotations.|
|load file||Loads data or session files. Specify a comma-delimited list of full paths or URLs.|
|collapse trackName||Collapses a given trackName. trackName is optional, however, and if it is not supplied all tracks are collapsed.|
|echo||Writes "echo" back to the response. (Primarily for testing)|
|exit||Exit (close) the IGV application.|
|expand trackName||Expands a given trackName. trackName is optional, however, and if it is not supplied all tracks are expanded.|
|genome genomeId||Selects a genome.|
|goto locus or listOfLoci||Scrolls to a single locus or a space-delimited list of loci. If a list is provided, these loci will be displayed in a split screen view. Use any syntax that is valid in the IGV search box.|
|goto all||Scrolls to a whole genome view.|
|region chr start end||Defines a region of interest bounded by the two loci (e.g., region chr1 100 200).|
|maxPanelHeight height||Sets the number of vertical pixels (height) of each panel to include in image. Images created from a port command or batch script are not limited to the data visible on the screen. Stated another way, images can include the entire panel not just the portion visible in the scrollable screen area. The default value for this setting is 1000, increase it to see more data, decrease it to create smaller images.|
|setSleepInterval ms||Sets a delay (sleep) time in milliseconds. The sleep interval is invoked between successive commands.|
|snapshotDirectory path||Sets the directory in which to write images.|
|snapshot filename||Saves a snapshot of the IGV window to an image file. If filename is omitted, writes a PNG file with a filename generated based on the locus. If filename is specified, the filename extension determines the image file format, which must be .png, .jpg, or .svg.|
sort option locus
|Sorts an alignment track by the specified option. Recognized values for the option parameter are: base, position, strand, quality, sample, readGroup, AMPLIFICATION, DELETION, EXPRESSION, SCORE, and MUTATION_COUNT. The locus option can define a single position, or a range. If absent sorting will be perfomed based on the region in view, or the center position of the region in view, depending on the option.|
|squish trackName||Squish a given trackName. trackName is optional, and if it is not supplied all annotation tracks are squished.|
|viewaspairs trackName||Set the display mode for an alignment track to "View as pairs". trackName is optional.|
|preference key value||Temporarily set the preference named key to the specified value. This preference only lasts until IGV is shut down.|
Example java code:
Socket socket = new Socket("127.0.0.1", 60151);
PrintWriter out = new PrintWriter(socket.getOutputStream(), true);
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
String response = in.readLine();
response = in.readLine();
response = in.readLine();
response = in.readLine();
response = in.readLine();
As of version 1.5, a user can load a text file to execute a series of sequential tasks by using Tools>Run Batch Script. The user loads a TXT file that contains a list of commands, one per line, that will be run by IGV. Arguments are delimited by spaces (NOTE: not tabs). Lines beginning with # or // are are skipped. See Controlling IGV through a Port for accepted commands.
The example script does the following:
This section describes two forms of HTML links for interacting with IGV from a web page. The first can be used to launch IGV on the client machine at a specific locus with a supplied session file. The second can be used to load data and session files into IGV (after it has been launched).
The first type of HTML link makes use of a dynamic "php" file hosted at the Broad Institute to launch IGV on a specified session or data file. An example follows
The table below contains the full parameter list
|sessionURL||Required||URL to a session file (further described below), or a comma-delimited list of data files.|
|file||Alias for sessionURL|
|index||Optional||URL to an index file, or a comma-delimited list of index files. This parameter is only used if sessionURL points to and indexed data file or list of files. If a list is used, the index list must be the same length as the data file list.|
|locus||Optional||Locus to display. Use any syntax that is valid in the IGV search box.|
A short name identifying your website or organization.
|genome||Optional||A genome identifier (e..g hg18). This is useful if you specify a data file rather than a session file for the sessionURL. Click here for a list of recognized genome ids.|
|initalHeapSize||Optional||Initial memory footprint, specified as an integer followed by an "m" for megabytes. The default value is 256m.|
|maxHeapSize||Optional||Maximum memory setting. The default value is 1000m (1 gigabyte).|
|name||Optional||Specifies a name or names for the track(s). This parameter is ignored if loading a session file.|
|merge||Optional||Controls whether or not the loaded data is merged with the existing IGV session, or a loaded into a new session. If false, any data currently loaded will be unloaded after clicking this link. The default value is false if file is a session file, true otherwise.|
BAM File Example:
Session File Example:
The second type of link will load data into a running IGV. This makes use of the listener port, which must be enabled. This option can be controlled on the "Advanced" preferences tab, and is enabled by default listening on port 60151. Links can be created to load data or jump to a locus as follows.
The file parameter value can be a URL or a comma-delimited list of URLs to most IGV-supported data file types (exceptions listed below), or a session file. The merge parameter (optional) controls whether or not the loaded data is merged with the existing IGV session, or a loaded into a new session. If false, any data currently loaded will be unloaded after clicking this link. The default value is false if file is a session file, true otherwise. The name parameter (optional) specifies a name or names for the track. If multiple tracks are loaded as a comma-delimited list, the name parameter value should also be a comma-delimited list of the same size. The name parameter is ignored if loading a session.
IGV produces a session file in XML format when a user clicks on File>Save Session. You can also create a session file manually. The XML format (IGV version 1.5) is described below.
Required - These elements are required in a session file.
Optional - These elements are optional in a session file and are used to determine the placement of tracks and visual style choices. They are included in an XML file produce when you save a session in IGV, but are typically not included in an XML file that is created manually.
The XML below is an example of a minimal session file.
<?xml version="1.0" encoding="UTF-8"?>
<Global genome="hg18" locus="EGFR" version="3">
<Resource name="RNA Genes" path="http://www.broadinstitute.org/igvdata/tcga/gbm/GBM_batch1-8_level3_exp.txt.recentered.080820.gct.tdf"/>
<Resource name="RNA Genes" path="http://www.broadinstitute.org/igvdata/annotations/hg18/rna_genes.bed"/>
<Resource name="sno/miRNA" path="http://www.broadinstitute.org/igvdata/tcga/gbm/Sample_info.txt"/>
When zoomed in sufficiently, the reference genome Sequence track appears at the top of the lower panel above the Genes track, if any, in the IGV display as shown in the Screenshot (2015.04.01). The sequence is represented by colored bars or colored letters, depending on zoom level, with adenine in green, cytosine in blue, guanine in yellow, and thymine in red (A, C, G, T).
You can change the strand that is displayed by clicking on the arrow in the title to the left of the track. Note that the sequence and the arrow are only displayed when zoomed in to a sufficiently small region.
The direction of the arrow indicates which strand is currently displayed. An arrow pointing left indicates that the negative strand is showing. This strand will show the complement nucleotides and reverse complement translations.
With the reference genome sequence track, you can optionally display a 3-band track that shows a 3-frame translation of the amino acid sequence for the corresponding nucleotide sequence. The translation is shown for the strand indicated.
Amino acids are displayed as blocks colored in alternating shades of gray. Methionines are colored green, and all stop codons are colored red. When you zoom all the way in, the amino acid symbols will appear.
You can toggle the display of this translation track by clicking once, anywhere in the sequence or translation track, or by toggling Show Translation in the track popup menu.
There are 3 different options for viewing the feature track. These allow you to display overlapping features, such as different transcripts of a gene, on one line or multiple lines
To change the view of the feature track, right-click on the feature track and select one of the options:
This feature is similar to feature jumping. To feature-jump, you select a feature track and press Ctrl-F for forward, Ctrl-B for back. To exon-jump, you select a feature track and press SHIFT-Ctrl-F to center the next exon in your view, SHIFT-Ctrl-B to move back one exon.
The "name" field (column 4) of a BED file can contain GFF3 style key-value attribute tags by specifying "gffTags=on" on the track line. These attributes will be displayed in the mouse hover popup text.
Data and genomic annotations can be loaded from local files, HTTP URLs, an IGV data server, or a Distributed Annotation Server (DAS).
Load data files by browsing for files on the local file system. See File Formats for information about the file formats IGV accepts.
To load data from the file system:
IGV will display a warning if the file is an un-indexed ASCII-format file over 50 MB. It is recommended that such files should be indexed or converted to the binary TDF format prior to loading (see section on igvtools).
To load data from an HTTP URL:
To load data from the IGV data server:
IGV provides limited DAS support for "basic features" requests . To load from a DAS server, enter the DAS source feature URL, with data source name. Some examples that are known to work for the hg18 assembly are:
Note: The segment tag, if present, is ignored. IGV generates the segment tag as needed.
UCSC DAS Sources
The UCSC DAS server is organized such that all features for a given genome assembly (e.g. hg18) are served from a single feature URL. Specific tracks are specified by the type tag. To prevent queries from overwhelming available memory IGV requires that a type tag parameter be included with UCSC DAS urls, for example the following URL loads the "dgv" track
To see all available types for a specific assembly the following URL can be used (substitute an assembly id, e.g. hg18, for <dbID>).
Feature Visibility Window
The DAS protocol does not provide a reliable way to know how many features will be returned for a given request. To prevent IGV from freezing when loading high-density DAS tracks a "feature visibility window" is used to prevent loading when zoomed out. The default value for the window is 250 kb. This can be changed by right-clicking over the track.
To remove all tracks and attributes:
To remove specific tracks, do one of the following:
One of the common causes for a data loading failure is a mismatch in chromosome names between the data file and the IGV genome it is being viewed against. Many Bowtie users report this problem after aligning to the supplied NCBI index files because chromosomes are named by accession numbers in the form: gi|224589811|ref|NC_000002.11|.
The workaround is to create an alias file in 2-column tab-delimited format. The first column contains the chromosome name in your data file, for example wig or bam file. The second column contains the corresponding name in the genome assembly you are viewing (e.g., chr1 for our "hg19" genome). For instance, the alias file might look like this:
NC_000002.11 <tab> gi|224589811|ref|NC_000002.11|
NC_000002.12 <tab> gi|224589811|ref|NC_000002.12|
Name the file after the genome with an underscore, the word "alias", and the extension .tab. For example, hg19_alias.tab. Place this file in the igv directory. The default location for this folder <user home>/igv/genomes, it can be changed in Preferences -> Advanced
Note: Certain well-known aliases are built into IGV and do not require an alias file. These include mappings that involve adding or removing the prefix "chr" to the name, for example 1 -> chr1 and chr1 -> 1. Also, NCBI identifiers that start with "gi|" and follow the pattern illustrated in the example above are automatically mapped.
When you load genomic data, IGV displays the data in horizontal rows called tracks. Typically, each track represents one sample or experiment. For each track, IGV displays the track identifier, one or more attributes, and the data.
When loading a data file, IGV uses the file extension to determine the file format (see File Formats), the file format to determine the data type (Table 1), and the data type to determine the track default display options (Table 2).
|File Format||Data Type|
|CBS, CN, MAF, SEG, SNP, VCF||Copy number|
|GCT||Gene expression or RNAi|
|BAM, bam.list, Goby files, PSL, SAM||Sequence alignments|
|BED, genePred, GFF, GFF3||Genome annotations|
|GWAS||Genome-wide association study data|
|IGV, WIG, HDF5 file not created with alignment processor, bedgraph||Other|
|Cytoband, FASTA||Not applicable. Cytoband and sequence files for an imported genome.|
|Data Type||Default Graph Type||Default Data Range||Default Colors|
|Copy number||Heatmap||-1.5 to 1.5||Blue to red|
|Gene expression||Heatmap||-1.5 to 1.5||Blue to red|
|Chip||Bar chart||None, data is autoscaled||Blue|
|DNA methylation||Heatmap||0 to 1
|Allele-specific copy number||Heatmap||-1.5 to 1.5||Blue to red|
|LOH||Heatmap||-1 to 1||Blue = LOH (1)
Yellow = Retained (0)
Red = Conflict (-1)
|RNAi||Heatmap||-3 to 3||Red to blue|
|GWAS||Scatter plot||None, data is autoscaled||Chromosome colors|
|Other||Bar chart||None, data is autoscaled||Blue|
You can override IGV's default display options in several ways:
This section describes a few commonly used display options that apply to all (or most) tracks: graph type, data range, track color, track height, and track names. For information about how to load and display specific types of data, see Viewing Data. For a complete list of display options, review the options available in the pop-up menus, Preferences window, Color Legends window, and the menu bar (View and Tracks menus).
Most tracks are displayed using one of four graph types (the following graphs show the same data):
|Points (Scatter plot):|
IGV determines the default graph type for a track as described in Default Display.
To change the graph type of selected tracks:
The data range for a track provides the minimum, baseline, and maximum value for the graph, and also whether the scale is linear or logarithmic. IGV determines the default data range for a track as described in Default Display.
To change the data range for selected heat map tracks:
To change the data range for other selected tracks:
Changing the data range can significantly affect the data display:
|minimum, baseline, maximum||Result|
To change the track color for selected heat map tracks:
To change the track color for tracks that are displayed as something other than a heatmap (i.e., bar chart, scatter plot, or line plot):
To change the height of selected tracks:
To change the height of all tracks:
To fit the data to the window:
By default, IGV displays track names to the left of the attribute panel. Legibility of the track names depends on track height; for example, track names will not be legible when track height is 1 pixel).
To select the attribute IGV uses as the track name:
To display the track name as a track label:
To rename a track:
You can only rename one track at a time. You can preserve track name changes only by saving the session.
There are 3 different options for viewing the feature track. These allow you to display overlapping features, such as different transcripts of a gene, on one line or multiple lines
To change the view of the feature track, right-click on the feature track and select one of the options:
For expression data, use the GCT file format. This a tab-delimited format that contains a row for each probe set ID (or gene), a column for each sample, and expression values for each feature in each sample.
To display expression data, IGV must first map the probe set IDs named in the expression data file to their genomic locations. IGV displays data for all of the probes that it can map to genomic locations. If none of the probes in the file can be mapped, IGV displays an error message.
IGV determines the genomic locations for probes as follows:
Choose preferred mapping: By default, IGV uses its probe mapping files before its gene mapping files. If you prefer to map probes to genes, select the Map probes to genes radio button on the Probes tab of the Preferences window.
Probe mapping files map probe identifiers to chromosomal locations. They are compiled from source files provided by Affymetrix, Agilent, and Illumina. The Affymetrix and Agilent mapping files are split by species due to their large size. Separate mapping files are provided for human, mouse, and other (non-mouse, non-human) species. Human probe identifiers are mapped to hg18. Depending on the vendor, mouse probe identifiers are mapped to mm9 (Affymetrix), mm5 (Agilent) or mm8 (Illumina).
Following are links to the probe mapping files:
Gene mapping files map probe identifiers to gene identifiers. Following are links to the gene mapping files:
The probe and gene mapping files are compiled from source files provided by Affymetrix, Agilent, and Illumina. A list of the source files is available at http://www.broadinstitute.org/igv/resources/probes/data_sources_for_mapping.txt.
IGV displays RNAi data similarly to expression data, with one exception: to facilitate analysis of hairpin scores, IGV provides a unique RNAi bar chart. To display the bar chart:
The following figure explains how to read the bar chart.
Hover over a track to view hairpin values.
IGV can display genome-wide association study (GWAS) data as a "manhattan plot", color-coded by chromosome. Data formats are described here.
The plot represents the significance of the association between a SNP or haplotype and the trait being measured. The Y-axis shows -log10 transformed P values, which represent the strength of association.
The size of the data points in the plot and their height on the left-hand side of the data pane relate directly to their significance: the larger the point and the higher the point on the scale, the more significant the association with the trait. You can see the point size difference in the following screenshot of data on chromosome 1.
As in other parts of IGV, hovering over a data point allows you to see a pop-up containing the data specifically associated with that point. You can see the pop-up for the topmost data point in this image. Note that the point's position on the scale on the left is associated with its P value.
The following commands appear in the pop-up menu for GWAS tracks:
|Rename Track||Renames the track.|
|Remove Track||Removes the selected track from the display.|
|Set Data Range...||Changes the minimum, baseline, and maximum values of the scale used for the GWAS data.|
|Change Track Height...||Changes the display height of the track.|
Changes the display to use different color schemes for the chromosome color-coding. The chromosome color scheme (default) uses the colors defined by IGV.
The single color scheme changes all the chromosomes to display in a single color (blue by default).
The alternating color scheme uses two colors (blue and gold by default) that alternate through the chromosomes.
Set primary color...
|Set the color for the single color scheme and for one of the colors in the alternating color scheme.|
|Set alternating color...||Set the alternating color in the alternating color scheme.|
Set minimum point size...
|Set the minimum data point display size.|
|Set maximum point size...||Set the maximum data point display size.|
|Save image...||Save the current display as a PNG file.|
This page introduces viewing alignment data and its components on IGV in sections. Alignments are to a reference sequence and are used for different purposes that include:
In this guide, IGV feature examples are given for one datatype but concepts also apply to other datatypes. The order of the sections roughly on this page reflect features that become visible as one zooms in the view and cover:
Related topics on other pages cover more detailed topics:
Changes to certain display parameters in the Alignment Preferences panel should be made ahead of loading data. Some of these preferences can be overridden on a per-track basis through pop-up menu options or by loading saved sessions.
Default parameters are tuned to viewing DNA alignments that typically cover the entire genome at low coverage depth and filter out marked duplicate reads. Adjust Alignment Preferences panel parameters for RNA-Seq data, PCR-free whole genome sequences, and other data that deviate from the breadth and depth of coverage of typical DNA alignments.
For example, before loading RNA-Seq data, increase the Visibility range threshold to 500 without affecting IGV performance as expression data typically covers ~5% of the genome and the deeper coverage is by default downsampled. In addition, check Show junction track to visualize splice junctions.
Both BAM and SAM files are described on the Samtools project page http://samtools.sourceforge.net/ and in the 2014 article titled Sequence Alignment/Map Format Specification by the SAM/BAM Format Specification Working Group.
IGV requires that the alignment file, whether BAM or SAM, is sorted and indexed by coordinates. Indexing produces a secondary file with either a BAI or SAI extension, respectively. The resulting file can be associated with the alignment track by file naming convention, or loaded independently as a separate track with the index query parameter.
Igvtools does not process BAM files as alternative resources such as Samtools have been historically available.
For both types, the coverage track represents coverage for all the reads, whereas the reads displayed in the alignment track may only represent a fraction of the reads. This partial representation is called downsampling and occurs for deep read coverage areas to improve IGV performance.
IGV dynamically calculates and displays the default coverage track for an alignment file. When IGV is zoomed to the alignment read visibility threshold (by default, 30 KB), the coverage track displays the depth of the reads displayed at each locus as a gray bar chart. If a nucleotide differs from the reference sequence in greater than 20% of quality weighted reads, IGV colors the bar in proportion to the read count of each base (A, C, G, T).
When the alignment data is loaded with its matching extended coverage data, the coverage track displays data at all zoom levels including at the whole genome and chromosome view. To generate the extended coverage data file ending in TDF extension, use igvtools. The resulting file can be associated with the alignment track by file naming convention or loaded independently as a separate track. TDF tracks loaded independently from an alignment do not display dynamically calculated features such as allele frequencies.
IGV reduces memory usage at two levels to improve performance. The first occurs as the threshold zoom at which alignments become visible and the second applies to areas of deep read coverage that are downsampled. We present these two levers in this section together because the settings for each combine to impact IGV performance. Users should adjust the following default settings, tuned for DNA alignments at low coverage, for specific data types in the Alignment Preferences panel.
E.g., for RNA-Seq alignments that cover extended regions at low depth, increase the visibility range threshold to view alignments at wider zoom levels, e.g. to 500.
Downsampled reads areas are marked with a black rectangle just under the coverage track. The coverage track represents coverage for all the reads.
In the example shown, the downsampled regions are consecutive and marked by seven black rectangles just under the coverage track.
When an alignment track is loaded, two tracks are displayed: (1) a coverage track and (2) the alignment track. Display of the default splice junctions track requires enabling the setting in the Alignment Preferences panel. This section gives an overview of the alignment track. For options available from the alignment track menu, including grouping, sorting and coloring options, see the alignments section of the pop-up menu page.
IGV uses color and other visual markers to highlight potential genetic alterations in reads against a reference sequence. Genetic alternations include single nucleotide variations, structural variations, and aneuploidy. Structural variations include insertions, deletions, inversions, tandem duplications, translocations, and other more complex rearrangements. Interpretation of some of these variations are discussed briefy in this section and the next. Interpreting Color by Insert Size and Interpreting Color by Pair Orientation give more detailed explaination of read colors.
An additional factor to take into consideration when judging potential genetic alterations is quality of reads and quality of mapping. IGV uses transparency to indicate quality.
Colors and transparency are used at two levels within alignments: (1) for mapped reads, and (2) for individual bases within reads.
|mapped reads||see Paired-End Alignments section||mapping quality|
|individual bases||Mismatched bases||read quality (phred) score|
By default, read bases that match the reference are displayed in gray. Read bases that do not match are color coded, and insertions and deletions within reads relative to the reference are marked. Insertions are indicated by a purple I () and deletions are indicated with a black dash (–). In addition, mismatched bases are assigned a transparency value proportional to the read quality known as the phred score. This has the effect of de-emphasizing low quality reads.
Note that alignments that are displayed with light gray borders and transparent or white fill, as shown in the screenshot, have a mapping quality equal to zero. Interpretation of this mapping quality depends on the mapping aligner as some commonly used aligners use this convention to mark a read with multiple alignments. In such a case, the read also maps to another location with equally good placement. It is also possible the read could not be uniquely placed but the other placements do not necessarily give equally good quality hits.
In a gapped read, IGV indicates deletions with respect to the reference with a black bar.
Users can also specifiy color and also sort reads by various options, including start location, strand, nucleotide, mapping quality, sample tag, or read group tag. For a description of all user-specified color and sort options, see the alignment track pop-up menu.
For example, to sort alignments:
Sorting rearranges rows so that alignments that intersect the center of the display appear in the order specified. This can cause the alignment layout away from the center line to appear sparse. To restore the layout to an optimally packed configuration, select Re-pack alignments from the pop-up menu.
Repeat the most recent sort with hotkey ctrl-s.
IGV provides several features for working with paired-end alignments. This section covers viewing reads as pairs, coloring of mapped paired reads, and the split-screen view. Interpretation of colors is discussed briefy here and in more detail in Interpreting Color by Insert Size and Interpreting Color by Pair Orientation.
By default, IGV displays reads individually because they pack compactly. Select View as pairs from the right-click menu to display pairs together with a line joining the ends as shown in the image below. The hover element details (2) are also displayed either for a single read in normal view (left) or for a pair of reads in paired reads view (right).
IGV colors paired-end alignments in two ways.
Control+click (Mac: Command+click) a read to outline the read and its paired mate in the same color. Colors are arbitrary but unique to each pair. A black outline indicates that the selected read has no mate.
Outlined paired reads are preserved when switched to View as pairs option. However, outlining reads only works in the unpaired view and not in the paired view.
Hover over or click a read to view information about the read, including the location of its paired mate.
IGV colors (1) paired end reads with inferred insert size smaller or larger than expected; (2) read with mate that is aligned to a different chromosome; (3) paired-end alignments with deviant pair orientation. Note that coloring by insert size is a feature designed originally for DNA alignments against the genome. It is based on set base pair values or computed from the size distribution of a library.
Translocations on the same chromosome can be detected by color-coding for pair orientation, whereas translocations between two chromosomes can be detected by coloring by insert size. See both by selecting the Color alignments by> insert size and pair orientation option.
Split screen views can be invoked on-the-fly from paired-end alignment tracks. Right-click over an alignment and select View mate region in split screen from the drop-down list. If the alignment clicked over does not have a mapped mate this option will be grayed out.
Split-screen view shortcuts:
Coloring by insert size is for DNA alignments and is not designed to indicate RNA-Seq paired read mate distances. It is based on set base pair values or computed from the size distribution of a library against the reference genome as defined in the Alignment Preferences Panel.
The inferred insert size can be used to detect structural variants, such as:
IGV uses color coding to flag anomalous insert sizes. When you select Color alignments>by insert size in the popup menu, the default coloring scheme is:
In a deletion a section of DNA is absent in the subject genome compared to the reference genome.
When pairs from a section of DNA spanning the deletion are aligned to the genome the inferred insert size will be larger than expected. This is due to the deleted section of the genome, not present in the subject. Schematically this can be visualized as follows:
So in the case of a deletion, the inferred insert size is GREATER THAN the expected insert size. In IGV such an event might look like the following.
Reads that are colored red have larger than expected inferred sizes, and therefore indicate possible deletions.
In the case of an insertion, a section of DNA is present in the subject genome that is not represented in the reference genome.
The effect on distance between aligned pairs is opposite in the case of a deletion; the "inferred insert size" is smaller than expected.
The maximum size of an insertion detectable by insert size anomaly is limited by the size of the fragments. They must be long enough to span the insertion and include sequences on both ends that are mapped to the reference. The maximum detectable size is approximately equal to:
fragment length - (2x read length)
Detection of this event is therefore more likely with larger fragment libraries, such as Illumina mate-pair (not paired-end) and SOLID.
In the example above reads that are colored blue have smaller than expected inferred sizes, and therefore indicate insertions.
IGV codes inserts for inter-chromosomal rearrangements. For instance, in this case, one end is on chromosome 1 and the other is on chromosome 6.
The orientation of paired reads can be used to detect structural events including:
By selecting Color alignments>by pair orientation, you can flag anomalous pair orientations in IGV.
Orientation is defined in terms of read-strand: left versus right, and first read versus second read of a pair.
(figure courtesy of Bob Handsaker)
These categories only apply where both mates map to the same chromosome.
An inversion is a large section of DNA that is reversed in the subject genome compared to the reference genome.
When an inversion shows up in paired-end reads, the reads are distinctively variant from the reference genome.
This appears in IGV as shown below.
When a large section of DNA is duplicated and inserted into the genome in a reversed configuration compared to the original sequence, this is called an inverted duplication.
There will be overlapping left and right reads, and there will likely be altered coverage depth/copy number.
This appears in IGV as shown below.
When a large section of DNA is duplicated and inserted into the genome next to the original sequence, this is called a tandem duplication.
The reads will not only be duplicated, but also be arranged as shown below.
IGV will display this rearrangement as shown below.
When a large section of DNA is removed from one location and inserted elsewhere, that is a translocation.
Translocations on the same chromosome can be detected by color-coding for pair orientation, whereas translocations between two chromosomes can be detected by coloring by insert size.
IGV v2.1 (released April 2012) and onwards offer a coloring by bisulfite mode option from the right-click pop-up menu for alignments. The six offered modes are summarized in the table, and are explained further on this page.
For a general overivew of viewing alignments in IGV, see Viewing Alignments.
Coloring by bisulfite mode supports visualization of DNA libraries that have undergone bisulfite conversion and sequencing. The mode supports visualization of alignments from the following and similar techniques:
|IGV Bisulfite mode||description||relevance|
|CHH and CHG||
Additional methylation sites:
When bisulfite sequence tracks are initially loaded, default coloring of mismatches against the reference will show red T's and green A's. When coloring is switched to bisulfite mode, two new coloring schema are applied and together allow you to visually distinguish read strand and bisulfite conversion status.
Because not all mode matching sites are biologically relevant in the context of methylation, bisulfite experiments compare changes in methylation between a control sample and the variable. When comparing two samples, a change in methylation status will be marked by a difference in color for a given site. Red to blue indicates loss of methylation, or hypomethylation; blue to red indicates increased protection by methylation, or hypermethylation, as shown for the tumor sample in the screenshot below which visualizes data from Berman et al (2012).
Coloring by bisulfite mode in IGV allows for visualization of alignments of BS-Seq reads, a DNA-modification technique used to distinguish sites of DNA methylation and hydroxymethylation in epigenetic studies. Alignments in IGV are against a reference genome of correct sequence as coloring is based on deviations from the reference sequence. Read alignment may have been against a bisulfite-transformed genome sequence, in which case genomic coordinates would still be for that of the original reference genome.
In DNA methylation, the methyl CH3 group is added to the cytosine base at the carbon 5 position (5-meC) in a sequence-context dependent manner. In mammals this context is typically CpG dinucleotides, and in plants this is CpG, CpHpG, and CpHpH di- and tri-nucleotides. These correspond to the CG, CHG, and CHH bisulfite coloring modes in IGV. The IUPAC ambiguity code H represents any nucleotide but guanine.
Promoter methylation is typically associated with repression, while genic methylation correlates with transcriptional activity.
Bisulfite modification exploits the different sensitivities of cytosine and 5-meC to deamination by bisulfite under acidic conditions. Cytosine undergoes conversion to uracil whereas 5-meC is unmodified and remains intact. The uracil is subsequently converted to thymine after PCR amplification while 5-meC residues remain cytosines.
A number of the different genome-wide methylome technologies use bisulfite chemistry and this IGV mode applies to those that in addition sequence the bisulfite converted DNA, such as by Illumina high-throughput sequencing. These include whole-genome bisulfite sequencing (WGBS) and reduced-representation-bisulfite sequencing (RRBS), both of which provide single-nucleotide resolution.
RRBS targets bisulfite sequencing to an enriched population of the genome while WGBS porportedly determines the methylation state of every cytosine in the target sequence. However, as with any technique limitations exist, including the inability to discriminate 5-meC from 5-hydroxymethylcytosine (5-hmeC) modifications, which was discovered to be pervasive in mammalian DNA in 2009 (Yu, Cell 2012).
Multiple techniques are used to distinguish 5-hmeC from 5-meC. Of relevance to coloring by bisulfite mode in IGV is TAB-Seq (Tet-assisted bisulfite sequencing), in which 5-hmeC sites are protected by glucosylation prior to bisulfite conversion. Because 5-meC sites remain unprotected from mTet1 oxidation to 5-carboxylcytosine (5-caC), and subsequent bisulfite conversion, only 5-hmeC site cytosines remain unchanged in reads (Yu, Nature Protocols 2012).
The following figure diagrams the nucleotide conversions that occur for a methylated versus unmethylated locus during bisulfite conversion and PCR, and IGV's corresponding coloring of these sites in CG bisulfite mode.
For a given DNA fragment, four strands arise after treatment and PCR amplification. These are the original top strand (OT), the original bottom strand (OB), and strands which are complementary to OT and OB (CTOT and CTOB). IGV visualizes reads in one direction, and for the given direction reads from the opposite strand are automatically displayed as the reverse complement. Therefore, OT and CTOT reads are displayed in the reference-forward direction (gray) while OB and CTOB reads are displayed in the reverse direction (sage) and are differentially colored as indicated.
To sort reads by strand, use the right-click pop-up menu on the alignment track.
You can also infer the read-strand by the specific nucleotides that are highlighted by the mode. OT and CTOT yield methylation information for cytosines on the top strand (C and T highlighted), while OB and CTOB will give methylation information for the paired complement, that is for guanines paired to the methylatable cytosines (G and A highlighted).
In addition to detecting methylation states, bisulfite conversion is used in footprinting studies. For example to determine nucleosome positioning in yeast and mammalian cells.
The additional IGV color modes--HCG, GCH, and WCG (diagram)--are relevant to NOMe-Seq, a genome-wide nucleosome footprinting and methylome sequencing method (Kelly 2012). This method obtains nucleosome positioning information based on the GpC methyltransferase M.CviPI accessibility to GpC sites, and at the same time obtains endogenous DNA methylation information from CpG sites.
Berman, Benjamin P, Daniel J Weisenberger, Joseph F Aman, Toshinori Hinoue, Zachary Ramjan, Yaping Liu, Houtan Noushmehr, et al. 2012. “Regions of Focal DNA Hypermethylation and Long-Range Hypomethylation in Colorectal Cancer Coincide with Nuclear Lamina-Associated Domains.” Nature Genetics 44 (1): 40–46. doi:10.1038/ng.969.
Kelly, Theresa K, Yaping Liu, Fides D Lay, Gangning Liang, Benjamin P Berman, and Peter a Jones. 2012. “Genome-Wide Mapping of Nucleosome Positioning and DNA Methylation within Individual DNA Molecules Genome-Wide Mapping of Nucleosome Positioning and DNA Methylation within Individual DNA Molecules,” 2497–2506. doi:10.1101/gr.143008.112.
Lister, Ryan, Mattia Pelizzola, Robert H Dowen, R David Hawkins, Gary Hon, Julian Tonti-Filippini, Joseph R Nery, et al. 2009. “Human DNA Methylomes at Base Resolution Show Widespread Epigenomic Differences.” Nature 462 (7271). Nature Publishing Group: 315–22. doi:10.1038/nature08514.
Stirzaker, Clare, Phillippa C. Taberlay, Aaron L. Statham, and Susan J. Clark. 2014. “Mining Cancer Methylomes: Prospects and Challenges.” Trends in Genetics 30 (2). Elsevier Ltd: 75–84. doi:10.1016/j.tig.2013.11.004.
Yu, Miao, Gary C Hon, Keith E Szulwach, Chun-Xiao Song, Peng Jin, Bing Ren, and Chuan He. 2012. “Tet-Assisted Bisulfite Sequencing of 5-Hydroxymethylcytosine.” Nature Protocols 7 (12): 2159–70. doi:10.1038/nprot.2012.137.
Yu, Miao, Gary C Hon, Keith E Szulwach, Chun-Xiao Song, Liang Zhang, Audrey Kim, Xuekun Li, et al. 2012. “Base-Resolution Analysis of 5-Hydroxymethylcytosine in the Mammalian Genome.” Cell 149 (6): 1368–80. doi:10.1016/j.cell.2012.04.027.
IGV supplements each alignment track with (1) a coverage track and (2) if selected in the Alignment Preferences panel, a default splice junctions track. This page describes the default junctions track as well as independently loaded junctions data in the standard .bed format. See Sashimi Plot for how to derive and manipulate interactive junction visualizations within IGV.
Before loading data, check Show junction track in the Alignment Preferences panel. The panel's settings must be adjusted for your data as default settings are for genomic reads for which splicing is irrelevant.
When enabled, IGV dynamically computes the junctions track from alignment data. The junctions track calls a splicing event when at least a single read splits across two exons in the alignment track.
The junctions track calls a splicing event when at least a single read splits across two exons in the alignment track.
Without XS tag strand information, IGV renders junctions using read strand, and therefore IGV's display of the strandedness is inferred.
Each splice junction is represented by an arc from the beginning to the end of the junction.
Hovering the mouse over or clicking on a junction will display coverage information. The first screenshot shows multiple coverage detail panels for each three components of two splice junctions on opposite strands.
Menu options are as detailed for the Feature tracks menu with the following additions or differences.
|Tracks are collapsed by default. The expanded mode breaks up the junctions track to multiple junctions tracks to minimize visual overlap. IGV does not interpret isoform information.|
The height of the arc, and its thickness, are proportional to the depth of read coverage.
|Sashimi Plot||Displays junctions information for regions within the current IGV view in a new panel with additional options. See Sashimi Plot for details.|
|Export Features||Download junctions track from IGV as a .bed file.|
The splice junction view displays an alternative representation of .bed files encoding splice junctions, such as the "junctions.bed" file produced by the TopHat program. Display details are as described in the section above.
The track can also be computed dynamically from an Alignment track by enabling the Show splice junctions track option in the alignment preferences as described above.
Junction files should be in the standard .bed format. The score field is used to indicate depth of coverage.
Sashimi plots quantitatively visualize splice junctions for multiple samples from their alignment data along side genomic coordinates and a user-specified annotation track. IGV displays the Sashimi plot in a separate window and allows for more manipulations of the plots than the junctions track. Use Sashimi plots to screen differentially spliced exons along genomic regions of interest.
The Sashimi plot is displayed in a separate window. The coverage for each alignment track is plotted as a bar graph. Arcs representing splice junctions connect exons. Arcs display the number of reads split across the junction (junction depth). Genomic coordinates and the gene annotation track are shown below the junction tracks.
The screenshot above (2015.4.16) shows the Sashimi plot of the example data from the Splice Junctions page with the addition of kidney tissue data.
These options were expanded with IGV v2.3.47, released March 2015.
Set Exon Coverage Max
Set Junction Coverage Min
|Show Exon Coverage Data||
A junction's strandedness is determined by the BAM file XS tag value for the split read. How you assigned the XS tag values to the reads determines whether you potentially display novel junctions or display junctions reflecting previously determined junction annotations. See the Splice Junctions page for more details.
VCF (variant call format), MAF (mutation annotation file), and MUT (mutation) file formats display variations in sequence. Here we refer to MAF and MUT files together as mutation files. Links above take you to details for each format in the File Formats guide. Links below detail visualizing each type of file on IGV.
As tracks, MAF and MUT files display in a similar manner and you can overlay each on other data tracks so long as sample names follow TCGA conventions or match. To overlay all other MUT and MAF files, you must use a linking identifier in a sample information file as described in Mutation Files, and indicate this linking identifier in the Mutations Preferences Panel.
MAF (mutation annotation format) and MUT (mutation) files display mutations. IGV recognizes text-based files with .maf, .maf.txt, .mut, and .mut.txt file extensions as mutation files, but not binary files. IGV will display mutation files as independent tracks or overlaid on other data tracks, depending on your Mutations Preferences settings.
There are two ways to open multiple MUT or MAF files at once on IGV:
IGV will visualize each individual sample's mutation data as a single track.
By default, IGV displays mutations file data in distinct tracks. Overlay uses the Mutations tab of the Preferences window to modify display options. Do not use the right-click pop-up menu options Create Overlay Track nor Separate Tracks.
Load the mutation data and the other data to which it will be overlaid, e.g. a GCT expression file of RNA-Seq data.
Go to View>Preferences>Mutations and check the box Overlay mutation tracks. Press OK. Tracks will overlay (Screenshot below, 2015.02.20).
If the overlay box was checked before you loaded your two data types, the data may already be overlaid, may not be overlaid, or may be represented twice--once overlaid and also displayed as separate tracks. There are two actions to overlay data and remove duplicate independent mutation tracks:
You may need to quit and restart IGV to clear previously loaded data as starting a new session may not have cleared previous mutation data. This is a bug that will be fixed. The number of tracks indicated in the lower left corner of IGV should be consistent with what you are loading.
To remove mutation tracts that do not have a corresponding partner track, uncheck the box Show orphaned mutation tracks.
To color code mutations by type, e.g. misense, silent, etc., check box Color code mutations.
Overlaid mutation tracks give an additional Sort by mutation count option for a selected region of interest.
Right-click on the ROI marked in red at top and select Sort by mutation count from the pop-up menu. For example, in the following Screenshot (2015.02.20) the overlaid tracks for the ERCC2 locus are reordered to display those with mutations at the top of the window.
To overlay mutations on data tracks with differing sample names, overlay tracks using one of the following two approaches. Each approach uses different Sample Information files but both use the configurable linking identifier which you must indicate in the Mutations Preferences panel, where the default is set to LINKING_ID.
Click on the links to download the given example files to your desktop.
|Example files & description||Example file preview|
GCT format expression data & SEG format segmented copy number data.
MUT format mutation file with two mutations visible once zoomed to chromosome 17. The fourth column of a MUT file always refers to samples.
MAF formats are also accepted.
Sample information set 1:
Sample Mapping file and Attributes file. The mapping file omits the typical row header (#SampleMapping), and contains a linking column headed by LINKING_ID, which refers to labels in the fourth column of the MUT file.
Mapping files contain two columns with the linking identifiers in the second column.
Sample information set 2:
Modified Attributes file where the linking column header is Sample and refers to the labels in the fourth column of the MUT file.
Any column except the first may be used for the linking information. This allows use of an existing attribute column as the linking identifier column.
Follow these steps to visualize example data as shown in the Screenshot below (2015.03.09):
VCF stands for Variant Call Format, and this file format is used by the 1000 Genomes project to encode SNPs and other structural genetic variants. The format is further described on the 1000 Genomes project Web site. VCF calls are available at EBI / NCBI.
|Each bar across the top of the plot shows the allele fraction for a single locus.|
|The genotypes for each locus in each sample. Dark blue = heterozygous, Cyan = homozygous variant, Grey = reference. Filtered entries are transparent.|
If a file has more than 10 genotypes, the VCF file will be opened in its own pane, with a scroll bar, as shown below.
To see the options for changing the view of your VCF file, right-click on a variant. Some of the options are specific to the variant selected. Find more details on the menu options on the Pop-up Menu page.
The window size at which VCF data is loaded is proportional to the number of samples. To change this, right-click and select Set Feature Visibility Window...
To change the color coding of the plot, select Color By>Allele.
The Sort Variant By options allow you to sort the set by a trait of a specific variant. You can select the sort twice for the same variant to flip it, i.e., if you sort depth, it sorts from high to low; select the depth sort a second time to sort from low to high.
The Display Mode changes what you can see of the data:
Collapsed removes all the genotypes, leaving only the allele frequency bars.
Expanded shows the genotypes at the usual row height, with the sample names in the first column.
Squished shows the genotypes with the rows compressed to maximize the data visible on the page.
You can also adjust the height of the squished row by right-clicking and selecting Change Squished Row Height. You can change the height of the rows in the window provided.
If you open a VCF file that does not contain genotypes data, the view will be different, displaying only the bars marking the calls, as shown below.
Similarly, the popup menu will be more limited, with only the Set Feature Visibility Window... and Remove Track options functional.
The Gene List view displays multiple loci side-by-side in split panes. The key difference between the Gene List view and the Regions of Interest navigator is that you can define gene lists using feature names of an annotation track, e.g. gene symbols for the RefSeq annotation track, in addition to genomic coordinates. The Gene List view loads all listed loci. You can then rearrange and remove panels from the display.
For the option to navigate the display to individual loci from a list, or to define regions of interest that you can export in BED format, see the Regions of Interest page.
To change the size of the flanking region around the gene displayed, before loading data go to View>Preferences>General>Feature flanking region and enter the base pairs or percent to display on either side of each locus.
The sections give step-by-step instructions for Gene List view features include the following.
To load or define a new gene or locus list, select Regions >Gene Lists....
This opens a window for selecting an existing list or creating a new list.
You can load an existing gene list from this window. To do so, select the gene list and click View. IGV informs you of feature items without matches and continues on to display loci with matches.
You can click Import to upload a text file containing your gene list. Load lists of genes or loci in GMT, GRP and BED format. For example, find and download GMT files from the Molecular Signatures Database.
Imported lists display organized under My lists and save to the igv>lists folder for continued future access.
You can also click New to create a new gene list. This opens a dialog in which you can enter a name, description, and your list of genes or regions.
When you click OK, this gene list will be filed under My lists in the gene lists window. Select that group, then select your new gene list, and click Load to view it.
You can save your gene list as a .gmt file by clicking Export.
To make a copy of a gene list, select an existing gene list and click Copy. This opens a window in which you can edit the list. When you click OK after editing it, this copied list will be filed under My lists.
You can edit or delete anything in My lists.
When you load a gene list, the main IGV window splits vertically to show the currently-loaded data for all the regions of the gene list.
Each vertical panel can be individually zoomed by double-clicking in that space.
Right-click on a gene name in the panel header to bring up the sort menu. This menu will vary depending on data type.
The following image illustrates what happens if you select Sort by amplification in the KRAS panel.
To remove a panel, right-click on the panel header and select Remove panel.
Panels can be rearranged by drag and drop. Click on the grey header bar at the top of the panel and drag it to its new position. For example, in the figure below KRAS has been dropped between RAC1 and RAC2.
To return to the original view of the gene, that is not zoomed, right-click the name/cytoband panel at the top of the pane you want to reset and select Reset panel to '[gene name]'.
To return to the “normal view”, double-click the name/cytoband panel at the top of any of the panes, or right-click in a name/cytoband panel and select Switch to standard view.
For more information see Viewing Gene Networks in the cBio Portal.
Regions of interest (ROI) are intervals that are defined and bookmarked by the user. Once the region is defined, it can be given a short description, can be used for sorting, and the sequence contained in that interval can be copied to the system clipboard for use in another application.
This page outlines three ways to define a region of interest--by mouse, by keyboard shortcut, and by using the Region Navigator. The forth section details functions applied to regions of interest from the ROI menu.
Starting from IGV 2.1.11 (released May 2012), define a region of interest either by a single base at the center of the view or with the entire displayed view.
The Region Navigator provides an editable table view of defined regions of interest. The table can be used to edit, add, and remove regions of interest as well as navigate. To open the Region Navigator select Regions>Region Navigator from the menu bar.
You can add regions of interest by defining them as above. You can also leave the Region Navigator open while you browse your data and genome, and when you are zoomed into a region that interests you, click Add. This will add the entire interval currently visible in IGV as a region of interest in the Region Navigator. In the screenshot, the same region was added twice.
The duplicated region in the screenshot was edited to a wider region and given a descriptor.
|Menu item and function||Instructions|
|Sort tracks based on data values within the region of interest||
Click the red bar above the region and select a Sort option. Sort options vary with the data types.
|Scatter Plot||See Scatter Plots section of IGV guide.|
Zoom. Center and zoom to the region of interest in your view
Click the red bar above the region and select Zoom.
|Edit description allows for a short description.||
Click the red bar above the region and select Edit description.
|Copy sequence or export the region of interest||
Click the red bar above the region and select Copy sequence.
|BLAT sequence||Click the red bar above the region and select Blat sequence. See BLAT Search page for details.|
|Delete removes the region of interest.||
Click the red bar above the region and select Delete.
|Menu item functions||Instructions|
|Remove regions of interest from the display||Select Regions>Region Navigator, select the region(s) you want to remove, and click Remove.|
|Export the defined regions of interest as a BED file (the data in the region are not exported)||Select Regions>Export Regions and save the file.|
|Import one or more regions of interest||Select Regions>Import Regions, browse to the location of the BED file, and select it.|
Attributes can be associated with tracks and used for filtering, sorting, and grouping data. By default all tracks have at least 3 attributes: Data File, Data Type, and Name. To display additional attributes, load a sample attribute file. IGV displays attribute names and values in the attributes panel.
IGV uses color-coded blocks to represent the attribute values.
To show or hide selected attributes:
To show or hide all attributes:
By default, IGV displays tracks in the order in which they are loaded (i.e., the order of the data in the files). Alternatively, it is possible to sort the tracks by attribute, region of interest, or track list. You can also group or filter tracks.
If tracks are grouped, IGV sorts the tracks in each group. To sort groups by attribute, first sort the ungrouped tracks by the desired attributes, then group the tracks.
To sort tracks based on an attribute value:
Alternatively, use the Sort Tracks command for additional options:
If tracks are grouped, IGV sorts the tracks in each group. It then sorts the groups using a composite score for the group, which IGV defines as the maximum score from the tracks in that group.
To sort tracks in the data panel based on a region of interest:
To display selected tracks in a specific order:
To group tracks by attribute:
You can filter track data to display only tracks that meet certain criteria.
To filter tracks:
To clear the filter:
You can save the current state of an IGV session to a named session file. You can use that file to restore the IGV session yourself or share it with colleagues, as long as they have access to the session file and any data files that were loaded when the session file was saved. For example, if the data files are loaded into IGV from a shared directory and the IGV session file is saved to that shared directory, anyone with access to the directory can restore the saved IGV session.
To save a session:
To restore a saved session:
Sessions are an integral part of IGV, allowing users to share their data and views with other users simply and accurately. Session files describe the session in XML. If you wish to manually create or edit a session file, use the information below to better understand the components of each session file.
Required - These elements are required in a session file. All session files must follow XML standards.
Optional - These elements are optional in a session file and are added by IGV to help determine the placement of the data and visual style choices.
The XML below is an example of a simple Session created by IGV
<?xml version="1.0" encoding="UTF-8"?>
<Global genome="hg18" locus="All" version="3">
<Resource url="http://genome.cse.ucsc.edu/cgi-bin/hgTrackUi?g=rnaGene" label="RNA Genes" name="RNA Genes" path="http://www.broadinstitute.org/igvdata/annotations/hg18/rna_genes.bed"/>
<Resource url="http://genome.cse.ucsc.edu/cgi-bin/hgTrackUi?g=wgRna" label="sno/miRNA" name="sno/miRNA" path="http://www.broadinstitute.org/igvdata/annotations/hg18/sno_mirna.bed"/>
<Panel height="445" name="DataPanel" width="1000">
<Track color="0,0,178" colorScale="ContinuousColorScale;0.0;20.0;255,255,255;0,0,178" displayName="Non coding RNA" expand="false" height="45" id="http://www.broadinstitute.org/igvdata/annotations/hg18/rna_genes.bed" name="RNA Genes" renderer="BASIC_FEATURE" visible="true" windowFunction="count">
<DataRange baseline="0.0" drawBaseline="true" flipAxis="false" maximum="20.0" minimum="0.0" type="LINEAR"/>
<Track color="0,0,178" colorScale="ContinuousColorScale;0.0;20.0;255,255,255;0,0,178" displayName="sno miRNA" expand="false" height="45" id="http://www.broadinstitute.org/igvdata/annotations/hg18/sno_mirna.bed" name="sno/miRNA" renderer="BASIC_FEATURE" visible="true" windowFunction="count">
<DataRange baseline="0.0" drawBaseline="true" flipAxis="false" maximum="20.0" minimum="0.0" type="LINEAR"/>
<Panel height="65" name="FeaturePanel" width="1000">
<Track color="0,0,178" colorScale="ContinuousColorScale;0.0;20.0;255,255,255;0,0,178" displayName="RefSeq genes" expand="false" height="30" id="Genes" name="Genes" renderer="BASIC_FEATURE" visible="true" windowFunction="count">
<DataRange baseline="0.0" drawBaseline="true" flipAxis="false" maximum="20.0" minimum="0.0" type="LINEAR"/>
Upload a new genome to your own server to share with others:
|Format||<name> *tab* <URL of the .genome file> *tab* <id>|
|Example||Human hg18 http://www.broadinstitute.org/igvdata/genomes/hg18.genome hg18|
By default, the File>Load from Server option in IGV provides access to public datasets stored on the IGV data server. You can host your own web accessible datasets by creating server registry and configuration files.
To create a custom load from server menu
IGV points to exactly one data registry file. If you'd like your data server to provide access to the public datasets on the IGV data server, include them in your registry file, as shown above.
When a user enters a password-protected URL, IGV prompts for a user name and password. If the username/password combination is incorrect, IGV will continue to ask the user to authenticate until the combination is entered correctly or the user clicks Cancel.
There is an example site so that users can test the password protection feature. Click http://www.broadinstitute.org/igvdata/private/. IGV should prompt for a username and password. Enter:
After verifying that server connection and authentication, user can try the following example files in IGV by clicking File>Load from URL:
There are many ways to set up a password-protected site. The following describes one method of handling this on an Apache server.
The Apache HTTP Server is a commonly used web server; it is, for example, in use at the Broad Institute. Setting up a password requires:
The Access File (.htaccess) is located in the restricted directory. It should contain the following information:
AuthName "Private IGV Folder"
The first line should contain the path to the Password File.
The Password File (.htpasswd) should be placed in a directory that is accessible internally, not through the web. This is can be the home directory, but it must be a location that is not externally visible. An example password file might look like this:
The file contains the usernames and passwords for all authenticated users, with one user per line. In the example line, the username is "user1" and the password is "kJx1GPxWtLet2," which is an encrypted password representing the human-readable word, "password."
To make the authentication lines, users can contact IT staff or use one of several websites that help generate them. The one used for this line was http://www.kxs.net/support/htaccess_pw.html. This website provides a string that can be used in the .htpasswd file.
To test use a web browser to access a file in the password-protected directory URL. You should be prompted for a username and password.
Search for a particular nucleotide sequence in the reference genome. The results are displayed as features in two new tracks. By default, the results from the positive strand are displayed in blue, and results from the negative strand in red. To change the color, right-click on the track and select Change Track Color... from the pop-up menu.
1. Bring up the motif finder dialog, via Tools>Find Motif...
2. Enter the sequence for which to search, using one of following three formats:
For example, let's say you want to find bacterial promoter upstream elements consisting of 6 adenines (A), followed by a purine (A or G), then any nucleotide (A, C, G, or T), and finally another purine (A or G). You would enter the sequence "AAAAAARNR".
For example, to find occurrences of the canonical TATA box sequences TATAAAAA, TATATAAA, and TATAAATA, you can enter the regular expression "TATA[AT]A[AT]A". Regular expressions are particularly useful for finding variable length sequences. For example, to search for the sequence TATAAA, optionally followed by any number of additional adenines, enter the regular expression "TATAAA+".
3. Enter names for the feature tracks that will show where the sequence matches the positive and negative strands of the reference genome.
Since we entered a short sequence, it gets a large number of hits. Looking at the results directly upstream of the gene GBP4, we see a match on the postive strand and two on the negative strand. Note that by default, the search result tracks are displayed in Expanded mode, so you can see overlapping matches.
The igvtools utility provides a set of tools for pre-processing data files. File names must contain an accepted file extension, e.g. test-xyz.bam. Tools include:
From IGV: igvtools is accessed by selecting Tools>Run igvtools.
Command line: The igvtools commands can also be run from the command line. To install, download the igvtools zip file from the Downloads page. On Windows, enter the commands at an MS-DOS prompt (select Start>Run and type: cmd). On Mac, enter the commands in a terminal window (select Applications>Utilities>Terminal).
The igvtools utilities can be downloaded from the Downloads page on the IGV Web site.
igvtools_<version #>.zip includes the jar file and shell scripts for running igvtools, as well as the genome files.
igvtools_nogenomes_<version #>.zip includes the jar file and shell scripts and shell scripts for running igvtools.
Starting with shell scripts
The igvtools utilities can be invoked, with or without the graphical user interface (GUI), from one of the following scripts:
igvtools (command-line version for linux and Mac OS 10.x)
igvtools_gui (gui version for linux and Mac OS 10.x)
igvtools.bat (command-line version for windows)
igvtools_gui.bat (gui version for windows)
The general form of the command-line version is:
igvtools [command] [options][arguments]
igvtools.bat [command] [options][arguments]
Recognized commands, options, arguments, and file types are described below.
Starting with java
Igvtools can also be started directly using Java. This option allows more control over Java parameters, such as the maximum memory to allocate. In this example, igvtools is started with 1500 MB of memory allocated:
java -Xmx1500m -Djava.awt.headless=true -jar igvtools.jar [command] [options][arguments]
To start with a GUI the command is
java -Xmx1500m -jar igvtools.jar -g
The scripts above allocate a fixed amount of memory. If this amount is not available on your platform you will get an error along the lines of "Could not start the Virtual Machine". If this happens you will need to edit the scripts to reduce the amount of memory requested, or use the Java startup option. The memory is set via a "-Xmx" parameter. For example -Xmx1500m requests 1500 MB, -Xmx1g requests 1 gigabyte.
The genome argument in the toTDF and count command can be either an id, or a full path to a .chrom.sizes or an IGV .genome file.
The toTDF command converts a sorted data input file to a binary tiled data (.tdf) file. Use this command to pre-process large datasets for improved IGV performance.
Supported input file formats are: .wig, .cn, .snp, .igv, and .gct.
Note: This tool was previously known as "tile"
igvtools toTDF [options] [inputFile] [outputFile] [genome]
inputFile The input file (see supported formats below).
outputFile Binary output file. Must end in ".tdf".
genome A genome id or path to a .chrom.sizes or .genome file. Default is hg18.
-z num Specifies the maximum zoom level to precompute. The default
value is 7 and is sufficient for most files. To reduce file
size at the expense of IGV performance this value can be
-f list A comma delimited list specifying window functions to use
when reducing the data to precomputed tiles. Possible
values are min, max, and mean. By default only the mean
-p file Specifies a "bed" file to be used to map probe identifiers
to locations. This option is useful when preprocessing . gct
files. The bed file should contain 4 columns:
chr start end name
where name is the probe name in the .gct file.
igvtools toTDF -z 5 copyNumberFile.cn copyNumberFile.tdf hg18
Data file formats, with the exception of .gct files, must be sorted by start position. Files can be sorted with the sort command described below. Attempting to preprocess an unsorted file will result in an error.
The count command computes average feature density over a specified window size across the genome. Common usages include computing coverage for alignment files and counting hits in Chip-seq experiments. By default, the resulting file will be displayed as a bar chart when loaded into IGV.
Supported input file formats are: .sam, .bam, .aligned, .psl, .pslx, and .bed.
igvtools count [options] [inputFile] [outputFile] [genome]
inputFile The input file (see supported formats above).
outputFile The output file, which can be binary "tdf" or ascii "wig format. The filename must. The filename must end in ".tdf" or ".wig", or be the special string "stdout". To indicate that you want to output both a .tdf and a .wig file, list both output filenames as a single string, separated by a comma with no other delimiters. If the output file is named "stdout" the output will be written to the standard output stream in wig format.
genome A genome id or path to a .chrom.sizes or .genome file. Default is hg18.
-z, --maxZoom num
Specifies the maximum zoom level to precompute.
-w, --windowSize num
The window size over which coverage is averaged. Defaults to 25 bp.
-e, --extFactor num
The read or feature is extended by the specified distance in bp prior to counting. This option is useful for chip-seq and rna-seq applications. The value is generally set to the average fragment length of the library minus the average read length.
The read is extended upstream from the 5' end by the specified distance.
Effectively overrides the read length, defines the downstream extent from the 5' end. Intended for use with preExtFactor.
-f, --windowFunctions list
A comma delimited list specifying window functions to use when reducing the data to precomputed tiles. Possible values are min, max, mean, median, p2, p10, p90, and p98. The "p" values represent percentile, so p2=2nd percentile, etc.
By default, counting is combined among both strands. This setting outputs the count for each strand separately. Legal argument values are 'read' or 'first'. 'read' Separates count by 'read' strand, 'first' uses the first in pair strand". Results are saved in a separate column for .wig output, and a separate track for TDF output.
Count the occurrence of each base (A,G,C,T,N). Takes no arguments. Results are saved in a separate column for .wig output, and a separate track for TDF output.
Only count a specific region. Query string has syntax <chr>:<start>-<end>. e.g. chr1:100-1000. Input file must be indexed.
Set the minimum mapping quality of reads to include. Default is 0.
Include duplicate alignments in count. Default false. If this flag is included, duplicates are counted. Takes no arguments
Compute coverage from paired alignments counting the entire insert as covered. When using this option only reads marked "proper pairs" are used.
The input file must be sorted by start position. See the sort command below.
igvtools count -z 5 -w 25 -e 250 alignments.bam alignments.cov.tdf hg18
Creates an index for an alignment or feature file. Index files are required for loading alignment files into IGV, and can significantly improve performance for large feature files. Note that the index file is not directly loaded into IGV. Rather, IGV looks for the index file when the alignment or feature file is loaded. This command does not take an output file argument. Instead, the filename is generated by appending ".sai" (for alignments) or ".idx" (for features) to the input filename as IGV relies on this naming convention to find the index . The input file must be sorted by start position (see sort command, below).
Supported input file formats are: .sam, .aligned, .vcf, .psl, and .bed.
igvtools index [inputFile]
Sorts the input file by start position, as required.
Supported input file formats are: .cn, .igv, .sam, .aligned, .psl, .bed, and .vcf.
NOTE: This command does not work with BAM files. The samtools package can be used to sort .bam files.
igvtools sort [options] [inputFile] [outputFile]
The special string "stdout" can be used as [outputFile], in which case the output will
be written to the standard output stream instead of a file.
-t tmpdir Specify a temporary working directory. For large input files
this directory will be used to store intermediate results of
the sort. The default is the users temp directory.
-m maxRecords The maximum number of records to keep in memory during the
sort. The default value is 500000. Increase this number
if you receive "too many open files" errors. Decrease it
if you experience "out of memory" errors.
Prints the igvtools version number to the console.
Select Tools>Run igvtools to open the igvtools window. This window allows you to run the toTDF, Count, Sort, and Index tools:
Information about the run will appear in the Messages box. Note that if you exit the IGV application, any tool that is in progress will be terminated.
The toTDF tool converts a sorted data input file to a binary tiled data (.tdf) file. Use this tool to pre-process large datasets for improved IGV performance.
Options you can change include:
Count computes average feature density over a specified window size across the genome. Common usages include computing coverage for alignment files and counting hits in Chip-seq experiments. By default, the resulting file will be displayed as a bar chart when loaded into IGV. To display feature intensity in IGV, the density must be computed with this option, and the resulting file must be named <feature track filename>.tdf.
Options you can change include:
This command creates an index for an alignment file or a feature file. Index files are required for loading alignment files into IGV, and can significantly improve performance for large feature files. Note that you do not directly load the index file into IGV. Rather, IGV looks for a corresponding index file when the alignment or feature file is loaded. This command does not take an output file argument. Instead, the filename is generated by appending ".sai" (for alignments) or ".idx" (for features) to the input filename as IGV relies on this naming convention to find the index . The input file must be sorted by start position (see the Sort tool, below).
NOTE: This tool cannot index a binary (BAM) file. Use the samtools package to sort and index BAM files.
Select the Input File. Supported file formats are .sam, .aligned, .vcf, .psl, and .bed.
Sort sorts the input file by start position.
NOTE: This option does not work with BAM files. The samtools package can be used to sort .bam files.
Options you can change include:
Starting with IGV v2.3.46, released March 2015, you can do a BLAT (BLAST-like Alignment Tool) search from a feature, alignment, or region of interest, of a sequence up to 8 kb in length.
The default search engine is the BLAT server hosted at the UCSC Genome Browser. UCSC's BLAT search supports most UCSC derived genomes including human and mouse genomes. Change to use a different BLAT server in Advanced Preferences.
Each query sequence appears as a new Blat feature track in the lower panel of IGV's display. The Screenshot (2015.04.01) shows five different Blat feature tracks for the following sequences:
Manipulate this track just like other feature tracks as outlined in the Feature Tracks section of the Pop-up Menus page.
Results are presented in a new window that displays the query sequence, location of hits, match score, and other metrics as shown in the Screenshot (2015.04.01). Hits are listed in descending order of alignment score.
For the example hit highlighted in the Screenshot above, the original search sequence is returned as the top hit. The read used in the search was an aligned RNA-Seq read spanning an intron (example b), which the BLAT results show is a singly gapped alignment as indicated by the 1 under the column T gap count.
For example, for ROI2 (marked 1 above), clicking on the second hit in the results panel (marked 2 in Screenshot below) navigates the view away from chromosome 19 to the hit locus on chromosome 22 (marked 3). This same region contains a hit for example c, a BLAT search done with an exon feature. Because the exon feature has a higher alignment score than ROI2, its Blat feature is shaded darker.