Tutorial
The GenePattern Tutorial introduces you to GenePattern by
providing step-by-step instructions for analyzing gene expression. It takes approximately 40 minutes to complete.
All of the information you need to successfully complete this tutorial is contained in the tutorial. For users who like additional discussion along the way, the tutorial includes pointers to more information in other GenePattern guides. Feel free to follow these links or to ignore them, depending on your learning style.
Introduction
Prerequisites
To follow the hands-on instructions in this tutorial, you must have access to the following:
Scientific Scenario
The gene expression dataset used in the tutorial is from Golub and Slonim et al. (1999), which used clustering and
prediction algorithms to find genes that distinguish between two subtypes of
leukemia, ALL and AML. The dataset consists of 38 bone marrow samples (27 ALL,
11 AML) obtained from acute leukemia patients.
For the purposes of this tutorial, your goal is identify marker
genes for the two subtypes of leukemia:
- You use the ComparativeMarkerSelection (version 5) analysis to find the
genes in the dataset file that are most closely correlated with the two
subtypes of leukemia.
- You use the ComparativeMarkerSelectionViewer (version 4) to examine the
results of that analysis.
- You use the ExtractComparativeResults (version 3) analysis to create a
new dataset that contains the marker genes that you have identified.
- You use the HeatMapViewer (version 8) to examine the expression levels
of the marker genes and to confirm that the marker genes are clearly
differentially expressed in the two subtypes of leukemia.
Note: If you are using a different version of any analysis, follow the instructions as closely as
possible, but be aware that your results might not match those shown in the
tutorial.
Starting GenePattern
To start GenePattern:
- Open a web browser, such a Mozilla Firefox, Internet Explorer, or
Safari.
- Enter the URL of the GenePattern server that you want to use. For
example, to use the public GenePattern server hosted by the Broad Institute,
enter http://genepattern.broadinstitute.org/gp/.
The login page appears.
- Enter your user name and password. If you do not have a GenePattern
account, select Click to register.
Whether a GenePattern server requires passwords
depends on how it is configured. The public GenePattern server requires
passwords. By default, a local GenePattern server does not.
- Click Sign In. The GenePattern home page
appears.
Note: The pictures in the tutorial show GenePattern
running in a Firefox web browser on a Windows PC. If you are using a
different web browser or a different operating system, your screen may look
slightly different.
 |
Click the GenePattern icon to return to this home page at any time. |
 |
The upper right corner shows your user name. |
 |
The navigation bar provides access to other pages. |
 |
The Modules & Pipelines panel lists the analyses that you can
run. Click the all radio button to list them alphabetically. |
 |
The center pane is the main
display pane, which GenePattern uses to display
information and to prompt you for input. |
 |
The Recent Jobs panel lists the most recent analyses that you
have run and their results files. When you start GenePattern for the first time, this panel is empty. |
For more information: see Exploring the User
Interface in the GenePattern User Guide.
Running Analyses
Now that you have started GenePattern, you
are ready to analyze your data. In this section, you learn how to:
- Run the ComparativeMarkerSelection analysis to find marker genes
-- the genes in the dataset that are most closely correlated with the two phenotypes
(ALL and AML) in the dataset.
- Run the ComparativeMarkerSelectionViewer to examine the ComparativeMarkerSelection
analysis results.
- Run the ExtractComparativeResults analysis to create a smaller
dataset that contains only the marker genes.
- Run the HeatMapViewer to examine the expression levels of the
marker genes.
Running ComparativeMarkerSelection
Run the ComparativeMarkerSelection analysis to find the
genes in the dataset that are most closely correlated with the two phenotypes
(ALL and AML) in the dataset. To run the ComparativeMarkerSelection analysis:
- In the Modules & Pipelines panel, under Gene List Selection, select the
ComparativeMarkerSelection analysis. GenePattern displays the ComparativeMarkerSelection parameters.
- For information about the module and its parameters, click help.
- For the input file parameter, select all_aml_train.res (ftp://ftp.broadinstitute.org/pub/genepattern/datasets/all_aml/all_aml_train.res).
- To use a downloaded copy of the file, click the Browse button and select the local file. Leave the Upload File radio button selected indicate that you are selecting a file from the file system.
- To use the file from the FTP site, select the Specify
URL radio button and enter the URL of the file.
- For the cls filename parameter, select all_aml_train.cls (ftp://ftp.broadinstitute.org/pub/genepattern/datasets/all_aml/all_aml_train.cls).
- Click Run to start the analysis. GenePattern sends the analysis job to the server and displays the Job
Status page. After a minute or two, the status icon changes from running
to complete
.
Analysis results are stored on the GenePattern server.
The text shown in red tells you when the analysis results will be deleted. To save your analysis
results, copy the files from the GenePattern server to your own directories, as
described later in this tutorial.
- Click Return to Modules & Pipelines Start to return to the home page.
Reviewing Result Files
When you return to the GenePattern home page, the Recent
Jobs pane shows the analysis job that you ran and the
associated analysis results files:

- Click the job number. GenePattern redisplays the status page. On the Job Status page, click Return to Modules & Pipelines Start to return to the home page.
- Click the icon next to the analysis job to display commands that you use
to work with the job:
| Download |
Downloads a zip file containing all analysis results files for this job. |
| Reload |
Displays the analysis and its parameters, with the parameters set to the values used for this analysis job. |
| Delete | Deletes the analysis job and its analysis results files from the GenePattern server. |
| Info | Displays the parameter values and the analysis results files for this job. |
View Java Code View MATLAB Code View R Code | Displays the command line that you would use to run this job in the Java, MATLAB, or R programming environments. These commands are useful for programmers who want to access GenePattern from one of these programming environments or from their own applications. |
- Click the icon next to an analysis results file to display commands that
you use to work with the file:
| Delete |
Deletes the file from the GenePattern server. |
| Save | Downloads the file from the GenePattern server. |
| Create Pipeline | Creates a GenePattern pipeline that reproduces this analysis results file. Pipelines are discussed later in this tutorial. |
| List of analyses | Lists analyses that commonly use this type of file as an input parameter.
Select an analysis to display its parameters with this results file specified as the first input parameter.
|
- Click the analysis results file, all_aml_train.comp.marker.odf, to
display it in a text viewer. The amount of information it contains makes the
file difficult to understand. This file, like most analysis results file, is
not intended to be viewed as a text file, but rather intended to be used as
input to subsequent analyses.
- Click the Back button in your web browser to return to the home page.
Running the ComparativeMarkerSelectionViewer
After running the ComparativeMarkerSelection analysis, run the
ComparativeMarkerSelectionViewer to examine the analysis results. To run the ComparativeMarkerSelectionViewer:
- In the Recent Jobs pane, click the icon next to your all_aml_train.comp.marker.odf
results file.
- Select ComparativeMarkerSelectionViewer. GenePattern displays the ComparativeMarkerSelectionViewer parameters.
The comparative marker selection filename parameter is automatically set
to the all_aml_train.comp.marker.odf results file.

- For the dataset filename parameter, click Browse
and select the all_aml_train.res
file.
- Click Run.
GenePattern displays the Job Status page.
Unlike most analyses, viewers run on your desktop PC
rather than on the GenePattern server. The first time you run a viewer on your
desktop, the following security message may appear.

- If the security message appears, click Run to continue.
The ComparativeMarkerSelectionViewer appears:

Tip: To change the width of a column, click and
drag the edge of the column heading.
- In the ComparativeMarkerSelectionViewer:
- The Score column shows the value of the metric used to
correlate gene expression and phenotype. A high score indicates correlation
with the first phenotype (upregulated in ALL) and a low score indicates
correlation with the second phenotype (upregulated in AML).
- The middle columns, FDR through FWER, provide different ways to
measure the significance of the score. The lower the value the more significant
the result. For example, you might choose to measure significance using the
false discovery rate (FDR) and set a significance cutoff of FDR < .05. Using
this measure, you would focus on genes with the lowest and highest scores,
where the measure of significance for the score was an FDR < .05.
- To close the ComparativeMarkerSelectionViewer, select File>Exit.
- In GenePattern, click Return to Modules & Pipelines Start
to return to the home page.
Running ExtractComparativeResults
Now that you have examined the ComparativeMarkerSelection
analysis results, you want to create a new dataset that contains only the most
promising marker genes from the results file for further analysis. To run the ExtractComparativeResults
analysis:
- In the Recent Jobs pane, click the icon next to the ComparativeMarkerSelection
results file, all_aml_train.comp.marker.odf.
- Select ExtractComparativeMarkerResults. GenePattern displays the ExtractComparativeMarkerResults
parameters. The comparative
marker selection filename parameter is automatically set to the all_aml_train.comp.marker.odf
results file.
- For the dataset filename parameter, click Browse
and select the all_aml_train.res
file.
- Enter the following parameter values to extract the top 100 features
(genes) in the analysis results file:
- For the field parameter, select Rank.
- For the max field, enter 100.

- Click Run. GenePattern displays the Job Status page. After a few seconds, the status icon changes from running
to complete
and GenePattern displays the analysis results files.

- Leave the Job Status page displayed.
Running HeatMapViewer
The HeatMapViewer displays expression values in a color-coded
heat map. The largest expression values are displayed in red (hot) and the smallest
values are displayed in blue (cool). Intermediate values are displayed in different
shades of red and blue. The color-coding provides a quick coherent view of gene
expression levels.
To display your new dataset in the HeatMapViewer:
- On the Job Status page, click the icon next to the results file from
the ExtractComparativeMarkerResults analysis, all_aml_train.comp.marker.filt.res.
- Select HeatMapViewer. GenePattern dislays the HeatMapViewer parameters. The dataset filename parameter is automatically set
to the all_aml_train.comp.marker.filt.res results file.

- Click Run. The HeatMapViewer appears:

- In the HeatMapViewer, adjust the grid size for a clearer view of the
sample and feature names:
- Click View>View Options.
- Click-and-drag the Row Size or Column Size
slider to modify the grid size.
- Close the View Options window.
- To close the HeatMapViewer, select File>Exit.
- In GenePattern, click Return to Modules & Pipelines Start
to return to the home page.
Using
Pipelines
As you have seen, GenePattern makes it easy
to run individual analyses and to review analysis results. Pipelines take this one step further: they make it easy to run multiple
analyses. You can define a pipeline to run multiple analyses against a single
dataset or to run a sequence of analyses, where the output from one analysis
becomes the input for a subsequent analysis. Modules run from a pipeline work
exactly the same as those run directly from GenePattern.
In this tutorial, you have run two analyses: Comparative
Marker Selection and Extract Comparative Marker Results. The analysis results
file from the first analysis became the input file for the second analysis.
Running these two analyses produced a new dataset that contains the 100 genes
in your dataset (all_aml_train.res)
that are most closely correlated with phenotypes in your class file (all_aml_train.cls).
In this section, you will
- Create a pipeline that duplicates your analysis results.
- Edit the pipeline so that it operates on any set of data (res and
cls) files.
For more information: see Working with
Pipelines in the GenePattern User Guide.
Creating a Pipeline Based on Previous Results
You can create a pipeline in one of two ways:
- You can create an empty pipeline and then add to it the analysis
modules that you want the pipeline to run.
- You can create a pipeline based on an analysis results file. In
this case, GenePattern adds to the pipeline the analysis modules used to create
that results file:
1. It adds the module that created the results file.
2. It checks the module's input file parameter.
3. If the input file for the module was the output file of a previous
module, the GenePattern adds the previous module and returns to step 2;
otherwise, there are no more modules to be added to the pipeline.
To create a pipeline based on the ExtractComparativeMarkerResults
results file:
- In the Recent Jobs pane, click the icon next to the results file from
the ExtractComparativeMarkerResults analysis, all_aml_train.comp.marker.filt.res.
- Select Create Pipeline. GenePattern displays the new pipeline.
The new pipeline includes the module that created the
results file, ExtractComparativeMarkerResults. It also includes
the ComparativeMarkerSelection analysis because the input file for
ExtractComparativeMarkerResults was the output file from the
ComparativeMarkerSelection analysis. The parameters for both analyses are set
to the values used to generate the results file. (You will need to scroll to
see the entire pipeline.)

- By default, the pipeline name is the job number. Change the pipeline
name to MyComparativeMarkerSelection by editing the Pipeline name field.
Adding Modules to a Pipeline
The pipeline contains the two analysis modules used to
create the analysis results file: Comparative Marker Selection and Extract
Comparative Marker Results. In your original analysis, after creating the
analysis results file, you used the Heat Map Viewer to review the results.
To add the Heat Map Viewer module to your pipeline:
- Scroll to the end of the pipeline definition.
- Click Add Another Module to add a module after the
ExtractComparativeMarkerResults analysis. GenePattern prompts you for the
module to add:

- Select Visualizer as the category and HeatMapViewer as the
module:

GenePattern adds the HeatMapViewer module to the
pipeline:

- For the dataset filename parameter,
select the analysis results file from the ExtractComparativeMarkerResults
module. To do so: next to use output from, choose ExtractComparativeMarkerResults
as the module and res as the output file:

- Click Save to save the pipeline. GenePattern displays a status
page.
- Click Continue to Modules & Pipelines Start to return to the
home page. GenePattern displays the pipeline and its parameters (if any).
Running the Pipeline
On the home page in the
Modules & Pipelines pane, notice that the pipeline names are color-coded;
the pipelines you create are displayed in red.
To run the pipeline:
- Click Run. GenePattern displays the Job Status, runs each module, displays the analysis
results, and opens the HeatMapViewer.
- Close the HeatMapViewer.
- In GenePattern, click Return to Modules & Pipelines Start
to return to the home page.
The Recent Jobs pane shows the pipeline job, which
lists each analysis run and its analysis result files.
To review the pipeline results in the HeatMapViewer:
- Click the job number to redisplay the Job Status page.

- Click the Open Visualizer icon. GenePattern opens the HeatMapViewer.
- Close the HeatMapViewer.
- In GenePattern, click Return to Modules & Pipelines Start to return to the home page.
Adding Parameters to a Pipeline
You have created a pipeline that duplicates your original
analysis: it runs the Comparative Marker Selection analysis on the all_aml_train data
(res and cls) files, uses the analysis results as input to the Extract
Comparative Marker Results analysis, and then displays the analysis results
using the Heat Map Viewer.
You can make the pipeline more generally useful by having it
prompt you for the data (res and cls) files to be analyzed, rather than simply
analyzing the all_aml_train
data files.
To edit the pipeline:
- In the Modules & Pipelines pane, select your pipeline.
GenePattern displays the pipeline:

- Click Edit. GenePattern displays the
pipeline definition for editing.
- The data (res and cls) files to be analyzed are named in the input file and cls file parameters
of the Comparative Marker Selection analysis. Modify the pipeline to have it
prompt you for these parameter values:
- Click on the Prompt when run check box next to
the input file parameter of the Comparative Marker
Selection analysis. GenePattern removes the parameter value from the
pipeline definition.
- Click on the Prompt when run check box next to
the cls file parameter of the Comparative Marker
Selection analysis. GenePattern removes the parameter value from the
pipeline definition.

- Optionally, click set prompt when run display settings to modify
the default text of the prompt.
- The data (res) files to be analyzed is also named in the dataset filename parameter of the Extract Comparative
Marker Results analysis. Modify the pipeline to have it prompt you for this
parameter value also:
- Click on the Prompt when run check box next to
the dataset filename parameter of the Extract
Comparative Marker Results analysis:

- Optionally, click set prompt when run display settings to modify
the default text of the prompt.
- Scroll to the end of the pipeline definition.
- Click Save. GenePattern displays a message confirming that
the pipeline has been saved.
- Click Continue to Modules & Pipelines Start to return to the
home page.
Running the Edited Pipeline
When you edit a pipeline and return to the home page, GenePattern displays the pipeline and its parameters:

To run the edited pipeline:
- For the input file parameter, select all_aml_test.res.
- For the cls file parameter, select all_aml_test.cls.
- For the dataset filename parameter, select all_aml_test.res.
- Click Run. The Status page appears and GenePattern runs each
module. After a few moments, GenePattern displays the HeatMapViewer.
- Close the HeatMapViewer.
- In GenePattern, click Return to Modules & Pipelines Start
to return to the home page.
Saving and
Deleting Result Files
As described earlier in the tutorial, analyses are run on
the GenePattern server and analysis results files are stored on the server.
Server storage is temporary and analysis results files are deleted after they
have been on the server for a certain length of time (by default, one week).
To save your analysis results files, you must copy each file
from the server to a more permanent location. If you do not need your analysis
results, you can delete them at any time.
To save an analysis results file:
- In the Recent Jobs pane, click the icon next to a results file.
- Click Save.
To delete an analysis results file:
- In the Recent Jobs pane, click the icon next to a results file.
- Click Delete.
To save or delete a job and all of its analysis results files,
click the icon next to the job and click Download or Delete.
Exiting from GenePattern
To exit from GenePattern, click the Sign out link in the top right corner of the title bar and
then close the web browser window.
Learning More About GenePattern
Thank you for taking this time to learn about GenePattern!
As you continue to work with GenePattern, please visit http://genepattern.org, which provides an overview of the
GenePattern analysis modules, as well as links to the GenePattern software and
documentation.
We welcome your feedback. If you have suggestions, comments,
or questions that are not answered by the documentation, contact the
GenePattern help desk (gp-help (at) broadinstitute.org).