Mutation Files

MAF (mutation annotation format) and MUT (mutation) files display mutations. IGV recognizes text-based files with .maf, .maf.txt, .mut, and .mut.txt file extensions as mutation files, but not binary files. IGV will display mutation files as independent tracks or overlaid on other data tracks, depending on your Mutations Preferences settings.

  • MAF files that originate from The Cancer Genome Atlas (TCGA) project follow fixed conventions that allow IGV to visualize and overlay the files directly. The format is detailed here.
  • All other MAF files require testing and may need modification for visualization or overlay. For overlay, if sample names do not follow TCGA conventions, they must match, or you must use a linking identifier.
  • The MUT file format is specific to IGV and has five fixed columns as described here. To overlay MUT file data on other tracks, sample names must follow TCGA conventions, match, or use a linking identifier.

There are two ways to open multiple MUT or MAF files at once on IGV:

  1. Select all the files to be visualized from your system's file manager and drag-drop into IGV.
  2. Alternatively, load a single MUT or MAF file containing multiple sample information. For instructions on merging multiple text files, see How to concatenate multiple text files. The link also outlines how to convert a MAF file to MUT format and how to display multiple tracks in collapsed format.

IGV will visualize each individual sample's mutation data as a single track.

  • The all chromosomes view summarizes mutations in a coverage track (Screenshot below, 2015.02.18).
  • Zooming in, individual chromosome views and more detailed views mark sites of mutations with open rectangles. Default settings display these in black-and-white.
  • Color code mutations by mutation type (Screenshot above, 2015.02.18) by checking the Color code mutations box under View>Preferences>Mutations. See the Preferences page for more details.
  • Mouse-over or click on a mutation to bring up an information panel on the specific mutation. This panel displays the information provided in the mutation file columns, in order, up to an area limit.
  • A site where both alleles are mutated, or is mutated in multiple samples in a track that is a conglomerate of multiple samples, displays the rectangle with a horizontal line through the middle.

 

Overlay mutation tracks on other data tracks

By default, IGV displays mutations file data in distinct tracks. Overlay uses the Mutations tab of the Preferences window to modify display options. Do not use the right-click pop-up menu options Create Overlay Track nor Separate Tracks.

  • IGV will overlay multiple mutation tracks for the same data file.
  • To overlay mutation data on other data tracks, the sample names must follow TCGA conventions or match. If your sample names are neither, see the next section about linking identifiers.
  • For TCGA data, whether in MAF or MUT format, merged or multiple files, IGV matches to the patient identifier portion of the TCGA barcode, that is the first 10 digits of the barcode excluding hyphens, and is indifferent to information that follows in the barcode, e.g. tissue origin as marked by the 11th and 12th positions of the barcode. TCGA barcode IDs are described here. The implications of how IGV handles these barcodes is discussed on the MAF file format page. To differentially overlay TCGA mutation data for sample tracks from the same patient, see the next section on Using linking identifiers.
  1. Load the mutation data and the other data to which it will be overlaid, e.g. a GCT expression file of RNA-Seq data.

  2. Go to View>Preferences>Mutations and check the box Overlay mutation tracks. Press OK. Tracks will overlay (Screenshot below, 2015.02.20).

    1. If the overlay box was checked before you loaded your two data types, the data may already be overlaid, may not be overlaid, or may be represented twice--once overlaid and also displayed as separate tracks. There are two actions to overlay data and remove duplicate independent mutation tracks:

      1. Uncheck the Overlay box, press OK, recheck the box and press OK. This overlays the tracks and removes duplicate separate tracks.

      2. You may need to quit and restart IGV to clear previously loaded data as starting a new session may not have cleared previous mutation data. This is a bug that will be fixed. The number of tracks indicated in the lower left corner of IGV should be consistent with what you are loading.

  3. To remove mutation tracts that do not have a corresponding partner track, uncheck the box Show orphaned mutation tracks.

    1. If the Show orphaned mutation tracks was unchecked before loading your mutation data, and your preferences are set not to overlay mutations, mutation data may not display. Be sure this is checked before loading data.

  4. To color code mutations by type, e.g. missense, silent, etc., check box Color code mutations.

  1. To separate overlaid tracks, go to View>Preferences>Mutations and uncheck the box Overlay mutation tracks. Press okay to save preferences. Restart IGV and load data again.

Overlaid mutation tracks give an additional Sort by mutation count option for a selected region of interest.

Right-click on the ROI marked in red at top and select Sort by mutation count from the pop-up menu. For example, in the following Screenshot (2015.02.20) the overlaid tracks for the ERCC2 locus are reordered to display those with mutations at the top of the window.

 

Using linking identifiers to overlay tracks

For mutation files that do not follow TCGA conventions, or with sample names that do not match, you can overlay tracks (i) using a Sample Information file containing linking identifiers, and (ii) by specifying the linking identifier in the Mutations Preferences panel.

The association is specified by means of a special "linking" column in a sample information file.   By default IGV looks for a column with the heading LINKING_ID for this association, but the exact heading is configurable as a user preference under the Mutations tab of the Preferences window.   Mutations are overlaid on another track when the values of this column are equal.   A typical use case is to record an identifier identifying a patient, or sample, in this column.

To visualize mutation data:

  1. Format mutation data using the MUT file format; e.g., example.mut.
  2. Format the data from platform-specific assays using an appropriate file format; e.g., example.gct (expression data) and example.seg (segmented copy number data).
  3. Define attributes and their values in a sample information file; e.g., example_sampleinfo_LINKING_ID.txt. A sample information file contains a row for each track and a column for each attribute:
    The first column contains track identifiers. The track identifier for each mutation track and each associated data track must be included in the sample information file. The track identifiers can be found in the data files (e.g., example.mut, example.gct, and example.seg).
    Each subsequent column identifies an attribute and its value (if any) for each track. IGV uses the value of one attribute (by default, LINKING_ID) to link mutations to associated data tracks; a mutation track and data track that have the same value for this attribute are linked.

 

In the example sample information file, the LINKING_ID attribute (2nd column) links the mutation and data tracks. However, in practice, it might be easier to use an existing an attribute rather than adding a LINKING_ID attribute. Notice that, in this example, the LINKING_ID and Sample attributes have the same value. The LINKING_ID attribute could be removed from the sample information file and the Sample attribute used to link the mutation and data tracks. By default, IGV uses the LINKING_ID to overlay mutations on data tracks. If you use an attribute other than LINKING_ID, enter that attribute name on the Mutations tab of the Preferences window