Viewing Alignments

This page introduces viewing alignment data and its components on IGV in sections. Alignments are to a reference sequence and are used for different purposes that include:

  • Whole-genome sequencing and exome sequencing to find genetic variation.
  • ChIP-Seq to characterize protein-DNA interactions.
  • RNA-Seq to detect RNA expression levels, define transcriptomes with novel gene isoforms or variants, and detect subsets of RNAs such as actively translated transcripts in ribosome profiling (Ribo-Seq).

In this guide, IGV feature examples are given for one datatype but concepts also apply to other datatypes. The order of the sections roughly on this page reflect features that become visible as one zooms in the view and cover:

Related topics on other pages cover more detailed topics:

Changes to certain display parameters in the Alignment Preferences panel should be made ahead of loading data. Some of these preferences can be overridden on a per-track basis through pop-up menu options or by loading saved sessions.

Default parameters are tuned to viewing DNA alignments that typically cover the entire genome at low coverage depth and filter out marked duplicate reads. Adjust Alignment Preferences panel parameters for RNA-Seq data, PCR-free whole genome sequences, and other data that deviate from the breadth and depth of coverage of typical DNA alignments.

For example, before loading RNA-Seq data, increase the Visibility range threshold to 500 without affecting IGV performance as expression data typically covers ~5% of the genome and the deeper coverage is by default downsampled. In addition, check Show junction track to visualize splice junctions.

File Formats

The preferred file format for viewing alignments in IGV is the BAM format, a binary form of Sequence Alignment Map (SAM) format. We especially recommend the BAM format for large alignment files.

  • Besides BAM and SAM, additional supported file formats related to alignments include GOBY, VCF, PSL, BED, and TDF.
  • For details on viewing the older Illumina Pipeline v1.3 sorted.txt format see here.

Both BAM and SAM files are described on the Samtools project page http://samtools.sourceforge.net/ and in the 2014 article titled Sequence Alignment/Map Format Specification by the SAM/BAM Format Specification Working Group.

Sort and Index

IGV requires that the alignment file, whether BAM or SAM, is sorted and indexed by coordinates. Indexing produces a secondary file with either a BAI or SAI extension, respectively. The resulting file can be associated with the alignment track by file naming convention, or loaded independently as a separate track with the index query parameter. 

  • The file must include the .bam or .sam file extension.
  • To associate an index file using filename, The index files must have the same file name and must reside in the same directory as the file that it indexes. For example, the index file for test-xyz.bam would be named test-xyz.bam.bai. When you specify the location of the alignment file, IGV automatically searches for the index file within the same directory.
  • The GenePattern module Picard.SortSam will sort and index both BAM and SAM files.
    • Select coordinate sort order and BAM output format to produce the IGV-compatible sorted BAM file and BAI index file.
  • If you choose to use a SAM alignment file directly on IGV, the igvtools utility can both sort and index the file.
    • For IGV to sort and index, the SAM file must be on your local drive. A URL linked SAM file will cause an error.
    • If the file is already sorted, an index will be automatically created upon first loading the file. IGV will attempt to store the index in the directory where the alignment file resides with an SAI extension. If that fails, the index is stored in the user's IGV directory.
    • The generated SAI index file is specific to IGV and is not compatible with Samtools or other applications.

Igvtools does not process BAM files as alternative resources such as Samtools have been historically available.

 

Read Coverage Track

IGV supplements each alignment track with a default coverage track, and if enabled in the Alignment Preferences panel, a default splice junctions track

This section covers the default coverage track and a second type of coverage track, the extended coverage track.
  • IGV displays only one coverage track per alignment with preference for the extended coverage track.
  • The extended coverage track is visible at the genome and full chromosome view, while the default coverage track becomes visible alongside the reads at a more detailed zoom level.

For both types, the coverage track represents coverage for all the reads, whereas the reads displayed in the alignment track may only represent a fraction of the reads. This partial representation is called downsampling and occurs for deep read coverage areas to improve IGV performance. 

Default Coverage Track

IGV dynamically calculates and displays the default coverage track for an alignment file. When IGV is zoomed to the alignment read visibility threshold (by default, 30 KB), the coverage track displays the depth of the reads displayed at each locus as a gray bar chart. If a nucleotide differs from the reference sequence in greater than 20% of quality weighted reads, IGV colors the bar in proportion to the read count of each base (A, C, G, T).

  • Override this default threshold for individual coverage tracks by right-clicking the track and selecting Set allele frequency threshold.
  • View count details by hovering the mouse over a coverage bar. Copy count details to your computer's clipboard from the right-click menu.

Extended Coverage Track

When the alignment data is loaded with its matching extended coverage data, the coverage track displays data at all zoom levels including at the whole genome and chromosome view. To generate the extended coverage data file ending in TDF extension, use igvtools. The resulting file can be associated with the alignment track by file naming convention or loaded independently as a separate track. TDF tracks loaded independently from an alignment do not display dynamically calculated features such as allele frequencies.

  • To associate a coverage track using filename, the track must be named <alignment file name.extension>.tdf, and placed in the same directory as the alignment track.
    • For example, the coverage track for test.bam would be named test.bam.tdf. IGV loads this coverage track automatically when test.bam is loaded.
  • To dynamically associate coverage data with a BAM track, right-click on the alignment or coverage track and choose Load Coverage Data from the pop-up menu

 

Downsampling

IGV reduces memory usage at two levels to improve performance. The first occurs as the threshold zoom at which alignments become visible and the second applies to areas of deep read coverage that are downsampled. We present these two levers in this section together because the settings for each combine to impact IGV performance. Users should adjust the following default settings, tuned for DNA alignments at low coverage, for specific data types in the Alignment Preferences panel.

  1. Visibility range threshold
    1. Defines the nominal window size at which alignments become visible; default 30 kb
  2. Downsample reads with two parameters. Uncheck to turn off.
    1. Max read count; default 100
    2. Per window size; default 50 bases

E.g., for RNA-Seq alignments that cover extended regions at low depth, increase the visibility range threshold to view alignments at wider zoom levels, e.g. to 500.

Downsampled reads areas are marked with a black rectangle just under the coverage track. The coverage track represents coverage for all the reads.

In the example shown, the downsampled regions are consecutive and marked by seven black rectangles just under the coverage track.

 

Alignment Track

When an alignment track is loaded, two tracks are displayed: (1) a coverage track and (2) the alignment track. Display of the default splice junctions track requires enabling the setting in the Alignment Preferences panel. This section gives an overview of the alignment track. For options available from the alignment track menu, including grouping, sorting and coloring options, see the alignments section of the pop-up menu page.

  • When zoomed in to the alignment read visibility threshold, by default 30 KB, IGV shows the reads. The default visibility range threshold can be changed in the Alignment Preferences panel.

    

  • When zoomed in sufficiently, IGV displays a line at the center of the display. At higher resolutions, the center line becomes two lines that frame the bases centered in the display, as shown in the figure above.
    • The framed bases are the basis for Sort by operations for alignment tracks irregardless of where the mouse was right-clicked on the track. Details are below.
    • The display line can be turned off in the Alignment Preferences panel.

Detecting Structural Variants

IGV uses color and other visual markers to highlight potential genetic alterations in reads against a reference sequence. Genetic alternations include single nucleotide variations, structural variations, and aneuploidy. Structural variations include insertions, deletions, inversions, tandem duplications, translocations, and other more complex rearrangements. Interpretation of some of these variations are discussed briefy in this section and the next. Interpreting Color by Insert Size and Interpreting Color by Pair Orientation give more detailed explaination of read colors.

An additional factor to take into consideration when judging potential genetic alterations is quality of reads and quality of mapping. IGV uses transparency to indicate quality.

  • For RNA-Seq, TopHat outputs separate insertions.bed and deletions.bed files which IGV will load as separate tracks.
  • In addition to the .BED format, the .VCF format file displays structural variation.

Colors and transparency are used at two levels within alignments: (1) for mapped reads, and (2) for individual bases within reads.

  •   Color Transparency
    mapped reads see Paired-End Alignments section mapping quality
    individual bases Mismatched bases read quality (phred) score

Color and Transparency for Individual bases

By default, read bases that match the reference are displayed in gray. Read bases that do not match are color coded, and insertions and deletions within reads relative to the reference are marked. Insertions are indicated by a purple I () and deletions are indicated with a black dash (). In addition, mismatched bases are assigned a transparency value proportional to the read quality known as the phred score. This has the effect of de-emphasizing low quality reads.

  • To color code all bases, regardless of whether they are mismatched, right-click the track and select Show All Bases from the pop-up menu.  
  • To mark insertions greater than a specified size with a red I, select and specify the size cutoff for the Flag insertions larger than parameter in the Alignment Preferences panel. This is a new feature starting with IGV v2.3.46, released March 2015.
  • Transparency shading of quality can be turned off temporarily from the pop-up menu, or persistently from Alignment Preferences panel.
  • To change the default nucleotide coloring scheme for reads, see Modify the prefs.properties file.

Transparency for Mapped Reads

Note that alignments that are displayed with light gray borders and transparent or white fill, as shown in the screenshot, have a mapping quality equal to zero. Interpretation of this mapping quality depends on the mapping aligner as some commonly used aligners use this convention to mark a read with multiple alignments. In such a case, the read also maps to another location with equally good placement. It is also possible the read could not be uniquely placed but the other placements do not necessarily give equally good quality hits. 

Insertions

In a gapped alignment, IGV indicates insertions with respect to the reference with a purple I () or red I for  insertions greater than a user activated and specified cutoff.  Hover over the insertion symbol to view the inserted bases.

 

Deletions

In a gapped read, IGV indicates deletions with respect to the reference with a black bar.

Coloring and Sorting Alignments

Users can also specifiy color and also sort reads by various options, including start location, strand, nucleotide, mapping quality, sample tag, or read group tag. For a description of all user-specified color and sort options, see the alignment track pop-up menu.

For example, to sort alignments:

  1. Right-click a track for the pop-up menu.
  2. Select a Sort option from the pop-up menu. IGV sorts the alignments that intersect the center line of the display.

Sorting rearranges rows so that alignments that intersect the center of the display appear in the order specified.  This can cause the alignment layout away from the center line to appear sparse.   To restore the layout to an optimally packed configuration, select Re-pack alignments from the pop-up menu.

Repeat the most recent sort with hotkey ctrl-s.

 

Paired-End Alignments

IGV provides several features for working with paired-end alignments. This section covers viewing reads as pairs, coloring of mapped paired reads, and the split-screen view. Interpretation of colors is discussed briefy here and in more detail in Interpreting Color by Insert Size and Interpreting Color by Pair Orientation.

View As Pairs

By default, IGV displays reads individually because they pack compactly. Select View as pairs from the right-click menu to display pairs together with a line joining the ends as shown in the image below. The hover element details (2) are also displayed either for a single read in normal view (left) or for a pair of reads in paired reads view (right).

Coloring of Mapped Paired Reads

IGV colors paired-end alignments in two ways.

  • Interactively by user selection as shown in the purple highlighted reads marked by (1) in the images above, and
  • That deviate from expectations as marked by (3) in the image above.

Control+click (Mac: Command+click) a read to outline the read and its paired mate in the same color. Colors are arbitrary but unique to each pair. A black outline indicates that the selected read has no mate.

  • Control+click (Command+click) either read to clear the outline.
  • Right-click and select Go to Mate Region to jump to the paired mate.
    • If the paired reads have a large insert size, the paired mate will not be highlighted. Turn on the Color by insert size and pair orientation option from the popup menu to confirm as described below.
  • Right-click and select Clear Selections to clear all outlines.

Outlined paired reads are preserved when switched to View as pairs option. However, outlining reads only works in the unpaired view and not in the paired view.

Hover over or click a read to view information about the read, including the location of its paired mate.

IGV colors (1) paired end reads with inferred insert size smaller or larger than expected; (2) read with mate that is aligned to a different chromosome; (3) paired-end alignments with deviant pair orientation. Note that coloring by insert size is a feature designed originally for DNA alignments against the genome. It is based on set base pair values or computed from the size distribution of a library.

  • See Interpreting Color by Insert Size for more detail.
    • Blue is for inserts that are smaller than expected
    • Red is for inserts that are larger than expected.
    • Inter-chromosomal rearrangements are color-coded by chromosome.
  • See Interpreting Color by Pair Orientation for more detail.
    • Shades of green, teal, and dark blue show structural events of inversions, duplications, and translocations.
    • Color assignments depend on sequencing platform.
  • Other Color by options are described in the alignment track pop-up menu options

Translocations on the same chromosome can be detected by color-coding for pair orientation, whereas translocations between two chromosomes can be detected by coloring by insert size. See both by selecting the Color alignments by> insert size and pair orientation option.

 

Split Screen View

Split screen views can be invoked on-the-fly from paired-end alignment tracks. Right-click over an alignment and select View mate region in split screen from the drop-down list.  If the alignment clicked over does not have a mapped mate this option will be grayed out.

  • Multiple split screen panes can be displayed.
  • To return to a normal single screen view, right-click in the loci name panel at the top of the pane and select Switch to standard view
  • To act on an individual split screen pane, right-click in the loci name panel at the top of the pane and select Reset panel to or Remove panel.
  • Alternatively, double-click the loci name panel at the top of the pane you wish to keep.

Split-screen view shortcuts:

  • For Macs, Option + mouse-click on pane: zooms out specific loci column of split-screen
  • Shift + mouse-click on pane: zooms in specific loci column of split-screen {by what factor?}
  • Control + mouse-scroll on pane zooms in or out specific loci column of split-screen