Note: If you are using Excel to edit GenePattern files, be sure to save the file as a tab-delimited text file and supply the correct file extension. You can specify the file name in quotes to prevent Excel from appending .txt to the file name. Also, note that Excel's auto-formatting can introduce errors in gene names, as described in Zeeberg, et al (2004).

CHIP File Format

The CHIP file format contains annotation about a microarray (used with GSEA module). It lists the features (i.e probe sets) used in the microarray along with their mapping to gene symbols (when available). While this file is not used directly in the GSEA algorithm, it is used to annotate the output results and may also be used to collapse each probe set in the expression dataset to a single gene vector.

Chip annotation files can be specified in a tab-delimited file format (*.chip) or in a comma-separated file format (*.csv). The formats are identical other than the separation character (tab or comma). Typically, you use the tab-delimited (*.chip) file format.

The CHIP file format is organized as follows:

  1. The first line contains column headings that identify the content of each column in the remainder of the file. The file must contain three column headings:
    • Probe Set ID
    • Gene_Symbol
    • Gene_Title
    These three columns can appear in any order. The file may contain additional columns, which will be ignored.
    • For example:
      Probe Set ID Gene_Symbol Gene_Title
  2. The rest of the data file contains data for each probe set ID used in the microarray.
    • Line format:
      (probe set id) (tab) (gene symbol) (tab) (gene title)
    • For example:
      205699_at MAP2K6 mitogen-activated protein kinase kinase 6

Sample CHIP file: HG_U133A_annot.chip

Updated on March 05, 2013 14:36