Using GenePattern from R  Print-icon

Using R as a GenePattern client allows you to run GenePattern modules and to manipulate and visualize the results in a powerful, free statistical desktop package that works on most major platforms. Using GenePattern allows you to invoke methods written in many other languages without having to worry about how to launch them or whether you are passing incorrect parameters. This section describes how you can use the GenePattern R library to run GenePattern analyses:

Getting Started in R

If you are not familiar with R, see the following resources on the www.r-project.org website:

GenePattern R Library

The GenePattern R package allows you to invoke a GenePattern module as if it were a local R method running on your client and to get back from the module a list of result files. The package requires R version 2.4.1 or greater and the rJava package. The package can be downloaded from your GenePattern server in Windows(.zip), source (.tar.gz), and Mac OS X (.tgz) formats.

To download the GenePattern R package to your computer:

  1. Start GenePattern.
  2. Within GenePattern, select Downloads>Programming Libraries.
  3. Click appropriate link to download the GenePattern R package for your operating system.
  4. Install the package into your R environment by using the install.packages command:

    install.packages("full-path-to-GenePattern-R-package", type="source", repos=NULL)

Note: If you are using a version of R which you cannot modify (because it is a publicly-shared version and you do not have appropriate privilege), you can have it load the GenePattern library by setting the environment variable R_LIBS=<GenePattern install directory>/R/library in your autoexec.bat, .cshrc, .bashrc or other shell startup file. R will then load from its usual location, but will also search for and find the GenePattern library from your installation.

Running an R Program

This section explores a simple R program that runs a module, displays the resulting output, and loads it into an R matrix for further analysis. The included code can be copied and pasted into your R environment so that you can try it out, modify it, and create your own solutions.

The first statements in the application initialize various settings, which you must do once in every application that accesses GenePattern. You will need to customize the italicized GenePattern server URL, GenePattern user name (typically, your e-mail address), and password with values appropriate for your GenePattern server. The gp.login method returns a GPClient object that contains the information required for running modules on a GenePattern server.

# Load GenePattern package
library(GenePattern)
username <- "your email address"
password <- "your password"
servername <- "http://localhost:8080"
 
# Obtain a GPClient object which references a specific server and user
gp.client <- gp.login(servername, username, password)

After initializing the required settings, the application runs the PreprocessDataset module to preprocess a dataset. This example references the dataset using a publicly-accessible URL, but a filename would be equally valid. When you call an R method, such as run.analysis, the GenePattern package invokes the appropriate module on the server, passing all of the input parameters and input files. Control returns to your application when the module completes. (To run a module asynchronously, use the method runAnalysisNoWait.)

# input dataset for preprocess operation
input.ds <- "ftp://ftp.broadinstitute.org/pub/genepattern/all_aml/all_aml_train.res"
 
# preprocess the dataset
preprocess.jobresult <- run.analysis(gp.client, "PreprocessDataset", input.filename=input.ds)

When the module completes, it returns a JobResult object with which you can execute various methods. For example, you can call a method using a JobResult object to get an R list of the filenames that are the output of the module. Afterwards, you can download the files or leave them on the server and refer to them by URL. In this example, we view the results in a heat map:

# Obtain the url location of the result and run the visualizer
preprocess.out.file.url <- job.result.get.url(preprocess.jobresult, 0)
run.visualizer(gp.client, "HeatMapViewer", dataset=preprocess.out.file.url)

In this example, the application downloads the result file and displays the results in a file viewer window, then also loads the data into a matrix so that further manipulation can be performed in R:

# download result files
download.directory <- job.result.get.job.number(preprocess.jobresult)
download.directory <- as.character(download.directory)
preprocess.out.files <- job.result.download.files(preprocess.jobresult, download.directory)
 
# display the preprocessed result
preprocessed.out.file <- as.character(preprocess.out.files[1])
file.show(preprocessed.out.file)
 
# now read the output into a matrix
# so we can do further manipulation in R
data <- read.dataset(preprocessed.out.file)

You can combine GenePattern analyses with all of the rich statistical functionality of R. For example, you can use R's plot and legend methods to create graphic output, output JPEGs of your visualized data using savePlot, save modified matrices to files using save, or summarize and report on the data using your own code. Just remember: GenePattern modules create JobResult objects and those objects are available to the R client for processing.

The GenePattern R package also has methods to read and write GenePattern files (such as res, gct, and cls files), to enable running of multiple modules in parallel, to run modules with input from files that were output from previous modules without moving them from the server, and other utilities. Even if you choose not to look in the library, you can extend the techniques shown above to implement your own analyses.

For more information:

Using LSIDs from R

You can use Life Science Identifiers (LSIDs) instead of module names to identify modules for GenePattern to run. For R, this is primarily useful when you want to specify a particular version of a module for GenePattern to run. The easiest way to specify a particular version of a module is to specify the LSID as an argument to an R method such as run.analysis in place of the GenePattern module name. For example, the following statement invokes version 1 rather than the latest version of the PreprocesDataset module:

preprocess.jobresult <- run.analysis(gp.server, "urn:lsid:broadinstitute.org:cancer.software.genepattern.module.analysis:00020:1", input.filename=input.ds)

If you are unfamiliar with LSIDs and GenePattern versioning, see Concepts.

<< Using GenePattern from MATLAB Up >>

Updated on March 28, 2014 04:43