Note: If you are using Excel to edit GenePattern files, be sure to save the file as a tab-delimited text file and supply the correct file extension. You can specify the file name in quotes to prevent Excel from appending .txt to the file name. Also, note that Excel's auto-formatting can introduce errors in gene names, as described in Zeeberg, et al (2004).
GMX File Format
The GMX and GMT file formats are tab-delimited file formats that describe gene sets (used with the GSEA module). In the GMX format, each column represents a gene set; in the GMT format, each row represents a gene set. The GMX format is convenient for storing a relatively small number of gene sets (<256) and is easier to edit. The GMT format is more convenient for storing larger databases of gene sets. The GMX format contains a column for each gene set:
GNF2_SPTA1 na ALS2CR3 KLF1 SLC6A8 ... CA1
The first line contains the gene set name. Duplicate names are not allowed.
The second line contains the gene set description. GSEA uses the description field to determine what hyperlink to provide in the report for the gene set description: if the description is na, GSEA provides a link to the named gene set in MSigDB; if the description is a URL, GSEA provides a link to that URL.
The remaining lines list the genes in the gene set.