Note: If you are using Excel to edit GenePattern files, be sure to save the file as a tab-delimited text file and supply the correct file extension. You can specify the file name in quotes to prevent Excel from appending .txt to the file name. Also, note that Excel's auto-formatting can introduce errors in gene names, as described in Zeeberg, et al (2004).
GMT File Format
The GMX and GMT file formats are tab-delimited file formats that describe gene sets (used with the GSEA module). In the GMX format, each column represents a gene set; in the GMT format, each row represents a gene set. The GMX format is convenient for storing a relatively small number of gene sets (<256) and is easier to edit. The GMT format is more convenient for storing larger databases of gene sets. The GMT format contains a row for each gene set:
(gene set name) (tab) (description) (tab) (gene 1) (tab) (gene 2) (tab) ... (gene N)
GNF2_SPTA1 na ALS2CR3 KLF1 SLC6A8 ... CA1
The first column contains the gene set name. Duplicate names are not allowed.
The second column contains the gene set description. GSEA uses the description field to determine what hyperlink to provide in the report for the gene set description: if the description is na, GSEA provides a link to the named gene set in MSigDB; if the description is a URL, GSEA provides a link to that URL.
The remaining columns list the genes in the gene set.