GenePattern provides access to a broad array of computational methods used to analyze genomic data. Its extendable architecture makes it easy for computational biologists to add analysis and visualization modules, which ensures that GenePattern users have access to new methods on a regular basis.
This Concepts guide provides a brief introduction to GenePattern. All other GenePattern documentation assumes that you are familiar with the concepts covered here.
|Analysis and Visualization Modules||Analyze data using GenePattern modules.|
|Pipelines||Combine modules to form analysis pipelines.|
|Suites||Organize modules and pipelines into suites.|
|Servers||The GenePattern user interface communicates with a GenePatter server. The server runs the analyses and stores the results.|
|Jobs||Create a job on the server by running an analysis or pipeline.|
|Security and Permissions||A GenePattern administrator determines who has what access to the GenePattern server.|
|Version Numbers||GenePattern ensures reproducible analysis results by uniquely identifying every version of every module and pipeline.|
|Programming Environments||GenePattern can be accessed directly from Java, MATLAB, or R.|
Analysis and visualization modules are at the heart of GenePattern:
Each module includes its own documentation, which is supplied by the module developer. The Modules page of the GenePattern web site lists the modules available from the Broad Institute with links to their documentation. GParc is a repository and community where users can share and discuss their own GenePattern modules.
Pipelines combine analysis modules, visualization modules, and other pipelines into a single, reusable workflow. Pipelines can be defined to analyze a particular dataset; for example, you might create a pipeline to reproduce published analysis results. Or they can be parameterized, which allows the person running the pipeline to provide datasets and other analysis variables. Often a pipeline runs a progressive series of analyses, where the output from one analysis is used as input for the next.
When you create a pipeline, you select the modules (and pipelines) to be executed by the pipeline. Most modules require one or more parameters. You can specify the parameter values when you create the pipeline, have the pipeline use the output file from one module as the input parameter value for a subsequent module, or prompt the user for parameter values when the pipeline is run.
Pipelines can be used to share analysis methods or to document research. By providing a way to create and distribute an entire computational analysis methodology in a single executable script, pipelines enable a form of in silico reproducible research. Colleagues with access to the same GenePattern server can easily share pipelines. Alternatively, a pipeline can be exported from one GenePattern server and imported into another.
The repository maintained by the Broad Institute includes a number of pipelines that document analysis methodologies published by Broad researchers. The Modules page of the GenePattern web site lists the pipelines available from the Broad Institute with links to their documentation.
Suites group modules and pipelines into convenient packages. For example, if you tend to analyze copy number data, you might find it helpful to create a suite that includes the SNPFileCreator, GISTIC, and other related modules. Suites provide easy access to frequently accessed modules. They also provide a convenient way of collecting a set of modules and pipelines to be shared with other GenePattern users. Colleagues with access to the same GenePattern server can easily share suites. Alternatively, a suite can be exported from one GenePattern server and imported into another.
The repository maintained by the Broad Institute includes a number of suites. The Suites page of the GenePattern web site lists them.
To use GenePattern, you open a web browser and enter a URL. The URL that you enter is the address of a GenePattern server. The web browser provides the user interface. The server runs the analyses and stores the results.
You can use the GenePattern server hosted at the Broad Institute or download and install the GenePattern software. The server hosted at the Broad Institute is called the public server or the Broad-hosted server. All other GenePattern servers are known as local servers.
The Broad Institute hosts a publicly available GenePattern server at http://genepattern.broadinstitute.org/gp/. You can use the Broad-hosted GenePattern server without installing any software.
Using the Broad-hosted server has several benefits:
When you download GenePattern, you install a local GenePattern server. You can install a local server on a standalone machine for your personal use or on a networked machine for use by several people or an entire organization. A local GenePattern server shared by several users is sometimes called a networked GenePattern server. Instructions for installing a local GenePattern server are provided on the Download GenePattern page.
Using a local server has several benefits:
When you run a module or pipeline in GenePattern, the web browser sends your request to the GenePattern server. The server starts a job to run the analysis. Job results (analysis result files and execution logs) are stored on the GenePattern server for a period of time (by default, one week) and then deleted. The GenePattern home page displays your most recent jobs and the Job Result Summary page displays all of your jobs.
Every job run on the GenePattern server is owned by person who submitted the job. Owners are identified by their GenePattern usernames. Every job is persistent, which means:
GenePattern provides a flexible architecture that allows a user with server administrator privilege to control access to the server in several ways:
GenePattern servers are generally configured to distinguish between users and administrators. The following table shows the permissions used on the Broad-hosted server and the default permissions for a local server. GenePattern adjusts its user interface based on the permissions assigned to the person logged in; for example, only administrators see the Administration menu. The GenePattern documentation describes all of the GenePattern features. Your permissions determine whether a particular feature is visible.
|Server||User Permissions||Administrator Permissions|
Run public modules/pipelines
Create and run your own pipelines
Edit/delete your jobs and pipelines
GenePattern team has all permissions
GenePattern team can view/delete all jobs, modules, and pipelines
|Local server, standalone||Same as administrator permissions||
All users have all permissions
All users can view/delete all jobs, modules, and pipelines
|Local server, shared*||
Run public modules/pipelines
Create and run your own pipelines
Create and run your own modules
Create public pipelines
Edit/delete your jobs, modules, and pipelines
View/delete all jobs, modules, and pipelines
* When several users share a local server, the system administrator typically secures the server by assigning only a few users to the Administrators group. When a local server has designated administrators, users and administrators have the default permissions shown here.
For more information about security and permissions, see Securing the Server in the Administrators Guide.
GenePattern uses version numbers to uniquely identify modules and pipelines. When you create an object, GenePattern automatically assigns the object a version number of one (1). When you update an object, GenePattern automatically updates the object's version number. By carefully versioning each object, GenePattern ensures that you can accurately reproduce analysis results.
For example, you might create a pipeline that runs two modules: PreprocessDataset version 4 and HierarchicalClustering version 5. If the HierarchicalClustering module is updated (creating HierarchicalClustering version 6), version 1 of your pipeline still runs HierarchicalClustering version 5; thus, ensuring that the pipeline produces the same results each time it is run. However, depending on why you are using the pipeline, you might prefer to have the pipeline run the latest version of an analysis module rather than a specific version. You make that choice when you create (or edit) the pipeline. For example, you might update the pipeline (creating version 2) to have the pipeline always use the latest version of the HierarchicalClustering module. Now, when you run version 2 of the pipeline it uses HierarchicalClustering version 6. When you run version 1 of the pipeline, it uses HierarchicalClustering version 5.
When you view and edit modules or pipelines, GenePattern shows you their version numbers. Typically, you update the latest version of an object, which increments its version number. For example, editing version 1 creates version 2. At times, you may need to edit an older version, which creates a point version. For example, if you have versions 1 and 2, editing version 1 creates version 1.1.
GenePattern implements version numbers using Life Science Identifiers (LSIDs). Thus, object identifiers in GenePattern are sometimes called LSIDs.
A programmatic interface makes it easy for software programmers to call GenePattern modules from the Java, MATLAB, or R programming environments. For information about the programmatic interface, see the Programmers Guide.