ODF  Print-icon

Note: If you are using Excel to edit GenePattern files, be sure to save the file as a tab-delimited text file and supply the correct file extension. You can specify the file name in quotes to prevent Excel from appending .txt to the file name. Also, note that Excel's auto-formatting can introduce errors in gene names, as described in Zeeberg, et al (2004).

ODF File Format

The Ouput Description Format (ODF) is similar to the RES or GCT file formats for datasets. The main difference is in the header. The body of data still contains the expression level values for each gene in each sample. Thus the main data block (after the header lines) is a matrix of values. The columns are defined by a name and optionally a description. The rows have a name (name of the gene for instance) and a description (description of the gene). The columns contain the expression values for each gene in a sample. If the first gene in the data block is a particular Tyrosine Kinase then each of the samples contained in each of the columns will have expressions values for that particular Tyrosine Kinase in the first row.

Note: This ODF format is specific to GenePattern. It is not an Open Document Format (ODF) for Office Applications as defined by the Organization for the Advancement of Structured Information Standards (OASIS).

ODF Header for Datasets

The following example shows the header lines of an ODF file. The first five lines are required.The line numbers are shown for easy reference, they should not be included in your file.

1.   ODF 1.0
2.   HeaderLines=7
3.   Model= Dataset
4.   DataLines= 3
5.   COLUMN_TYPES: String String float float float *
6.   COLUMN_DESCRIPTIONS: Sample from DFCI Sample from UK Sample from Children's
7.   COLUMN_NAMES: Name Description Sample 1 Biopsy_2 Biopsy_4
8.   RowNamesColumn=0
9.   RowDescriptionsColumn=1

Lines 1 and 2 are required first and second lines. They must both be present in the header and be the first and second lines. They signify that this is an ODF formatted file (of type 1.0) and indicate the number of header lines that follow before the main data block (in this case 7 more). Line 3, required to be somewhere in the header of an ODF file, defines this ODF file as containing Dataset data. Line 4 is required somewhere in the header file. It indicates the number of data rows present in the data block. Line 5 is required somewhere in the header file for any ODF file that has a main data block. It defines the type of data in each column. Line 6 is a tab-delimited list of descriptions for each column. Line 7 is a tab-delimited list of names for the columns. Line 8 defines which column will have the row names, and Line 9 defines which column will contain the row descriptions.

Note: Following are a few notes about the ODF Header:

Main Data Block

The following example shows the first few lines of the main data block:

1000_at    X60188 HSERK1 Human ERK1 mRNA    145.3   240.37823    158.66888
1001_at    X60957 HSTIEMR Human tie mRNA    20.5    31.139397    14.053186
1002_f_at  X65962 HSCP450 H.sapiens mRNA    -9.6    118.06088    -8.287777

The main data block must be consistent with the header. The first COLUMN_NAMES element is "Name". This label is associated with the first column (values: 1000_at, 1001_at, and 1002_f_at). The second column's label is "Description" which is associated with the second column of the main data block. The next three columns are floating point numbers that represent the gene expression values for each of the samples.

Note: The first two columns are just text data, and next three columns only contain floating point values. This is consistent with the "String, String, float, float, float" elements in the COLUMN_TYPES: list.

Sample ODF file: all_aml_train.preprocessed0.odf

<< CHIP Up CLM >>

Updated on July 04, 2012 15:35