Note: If you are using Excel to edit GenePattern files, be sure to save the file as a tab-delimited text file and supply the correct file extension. You can specify the file name in quotes to prevent Excel from appending .txt to the file name. Also, note that Excel's auto-formatting can introduce errors in gene names, as described in Zeeberg, et al (2004).
Sample Information File Format (.txt)
The sample information file is a tab-delimited format that describes a set of SNP arrays. The column labels in the first row define the information provided for each array; each subsequent row describes one SNP array. The sample information file is organized as follows:
The first line contains the column labels. A sample information file can contain any number of columns and the column labels are arbitrary. However, SNP modules may require specific labels, as discussed below.
Label-1 (tab) Label-2 (tab) ... Label-n
Array (tab) Sample (tab) Type (tab) Ploidy(numeric) (tab) Gender (tab) Paired (tab) Platform
The remainder of the sample information file contains a line of information for each SNP sample. Where data is unavailable, columns may be empty.
Col-1- data (tab) Col-2-data (tab) ... Col-N-data
S004274N_250S_123005 (tab) S004274N (tab) Normal (tab) 2 (tab) (tab) (tab) 250K_Sty
A sample information file can contain any number of columns and the column labels are arbitrary. A SNP analysis module, however, may require a sample information file to include specific column labels. For example, the SNP module CopyNumberDivideByNormals requires a sample information file that includes two columns, Sample and Ploidy(numeric). Following is a list of commonly used column labels:
Array: Identifier for the SNP array.
Sample: Identifier for the biological sample used to generate the SNP array data.
Type: Brief description of the biological sample.
Ploidy(numeric): Integer value, where ploidy=2 indicates a normal sample.
Gender: Identifier that indicates the gender of the biological sample donor. For a sample from a male donor, Gender=M; from a female donor, Gender=F.
Paired: Value that identifies normal-target pairs. For the normal sample, Paired=Yes; for the target sample, Paired is set to the sample name of the paired normal sample.
Platform: SNP chip used to generate the array.
Note: When a SNP module requires a sample information file to include specific column labels, the module documentation lists the required column labels. Specify required column labels exactly: they are case-sensitive and space-sensitive.
The following steps outline how to copy exactly sample identifiers from Excel data and tranpose them from horizonal to vertical.
In Excel, Select entire row containing sample names and Copy. Open a new workbook, Paste Special>Transpose.
If starting from a RES file, to remove blank rows, Select relevant column(s), then click Edit>Go To>Special button>Blanks option and click OK. Blank rows will be selected. Choose Edit>Delete>Entire row option and click OK.
Label row headings exactly as specified for module, fill in cells, and save as tab delimited text (.txt). For example, ComBat module labels first three cells of Row 1: “Array”, “Sample”, and “Batch”.