When you run an analysis module, visualization module, or pipeline, GenePattern displays the parameters for the selected modules. Often, one or more of these parameters are input files, which must have a particular format; for example, you might need to supply a gct or res file. For more information about a particular file format, select it from the list at the right.
This section provides general information to help you create properly formatted files for GenePattern.
Although different GenePattern modules require different file formats, all of the files are tab-delimited or space-delimited text files. Most of your gene expression data is already in tab-delimited text files or in spreadsheet and database programs, which have export features that allow you to export the data into tab-delimited text files. Therefore, creating input files for GenePattern is relatively easy:
Start with a tab-delimited text file that contains the required gene expression data.
Open the file in a text editor (or spreadsheet editor).
Make the necessary format changes.
Save the file as a tab-delimited text file with the appropriate file extension.
If Mac OS does not allow direct change of TXT extension to GCT or RES from the file name, right click on the file>Get Info, expand Name & Extension section, uncheck Hide extension option, then change the extension in the box provided.
Converting and Processing Files
The Modules page of the GenePattern web site provides a complete list of the modules and pipelines available from the Broad Institute. Modules in the Data Format Conversion category convert files from one format to another. Modules in the Preprocess & Utilities category provide methods for importing and working with data files.
Converting CDT to GCT Files
One common question from GenePattern users is how to convert a cdt file to a gct file. Following is a brief tutorial that walks you through this process by converting sample.cdt to sample.imputed.gct:
Save the sample.cdt file to your local drive and open it in Microsoft Excel.
Delete the CLID and GWEIGHT columns. The gct file format allows for only two columns of annotations.
Delete the second row, which contains array identifiers (AID). The gct file format allows for only one row of identifiers.
Add two header rows at the top of the file:
In the first row, first cell, enter: #1.2
In the second row, first cell, enter the number of data rows: 1553
In the second row, second cell, enter the number of data columns: 44
Save the modified file as a text (tab delimited) file with the name sample.gct.
Verify that your new .gct file matches the requirements of a gct file in GenePattern.
Your original cdt file contained cells that were missing data. Most GenePattern modules require that all cells in a gct file contain data. Use the GenePattern analysis module ImputeMissingValues.KNN to add the missing data to your gct file. The module will take sample.gct as the input file, impute the missing data, and generate a sample.imputed.gct file.