GenePattern provides access to a broad array of computational methods used to analyze genomic data. Its extendable architecture makes it easy for computational biologists to add analysis and visualization modules, which ensures that GenePattern users have access to new computational methods on a regular basis.
If you are new to GenePattern, begin with the basics:
Creating a GenePattern module is a two-step process:
When writing a program that will be run as a GenePattern analysis module, keep in mind the following:
myfunction <-
function(...)
{
args <- list(...)
for(i in 1:length(args)) {
flag <- substring(args[[i]], 0, 2)
value <- substring(args[[i]], 3, nchar(args[[i]]))
if(flag=='-i')
{
...
}
<R2.5> <libdir> myscript.R myfunction -i<input.file>
perl
myscript" rather than including "#!/usr/bin/perl" as the first line of the myscript.pl file.myfunc() function in the R script named myscript.R,
passing a single parameter, input.filename:
<R>
<libdir>myscript.R myfunc <input.filename>
Calling R script from a command line is possible, but generally not useful because you cannot pass arguments to the script. To pass arguments to your R code, create a function. For example:
myfunc <-
function ( input.filename )
... (your R-code here)
Visualization modules are similar to analysis modules. The only difference between analysis and visualization modules is that analysis modules run on the server machine and visualization modules run on the client machine. Each module is launched in a separate process. An applet is used to launch the visualization module.
If you are writing MATLAB code to be invoked as a GenePattern module, follow the guidelines in Writing Modules for GenePattern. In addition, for MATLAB code, you must address licensing and distribution issues, as described in this section:
You can invoke a MATLAB executable from a GenePattern module using one of two approaches: the direct approach or the compiled approach. Following are brief descriptions of each approach, including its advantages and disadvantages:
If you are simply using your M-code on your standalone GenePattern server, the direct approach is simpler; however, if you want to give copies of your M-code to other people or deploy your M-code on a shared GenePattern server, the compiled approach is preferred. The compiled approach may provide slightly better performance for fast running modules since the startup delay will be shorter, but the actual execution time will be approximately the same for either approach.
The instructions in this section are based on the following MATLAB versions:
When you create a module in GenePattern, you specify the command line that invokes the program that performs the desired function. Generally, the command line includes arguments, such as the parameters for the algorithm and the data file to analyze.
Calling script M-code from a command line is possible, but generally not useful because you cannot pass arguments to the script. To pass arguments to your M-code, create a no-return entry function to serve as the top level call into MATLAB. The following example defines a no-return entry function that accepts two parameters:
function analyzeThis ( filename, whatToWrite )
... (your M-code here)
Writing Modules for GenePattern provides additional guidelines for writing code that will run as a GenePattern module.
If you do not plan to use the compiled M-code approach, skip this section and continue with Distributing Your MATLAB Code.
Compiling your MATLAB M-code into a standalone executable is described in the MATLAB Compiler Documentation. Please refer to this documentation to understand all of the options available to you. To summarize the simplest case, from within MATLAB, at the MATLAB prompt, execute the following command:
mcc -m analyzeThis
where analyzeThis is the name
of your entry function. This command generates the following files in your $MATLAB_ROOT/work directory:
| analyzeThis (Linux, Mac OS X) or analyzeThis.exe (Windows) |
Executable file |
| analyzeThis.ctf | Component Framework file |
| analyzeThis.c (Linux, Windows) | C language Source Code |
| analyzeThis.h (Linux, Windows | C Language Header file |
| analyzeThis_main.c | C language Source Code |
| analyzeThis_mcc_component_data.c | C language Source Code |
Note: To use the MATLAB compiler on Mac OS X, you must have Xcode 2.2 installed; minimally, the Developer Tools, gcc 4.0, gcc 3.3, Mac OS X SDK, and BSD SDK. These instructions were tested using Xcode 2.2.1.
After writing your MATLAB code, create a GenePattern module that invokes the code that you have written. Creating Modules in the GenePattern User Guide describes how to create a GenePattern module. This section provides supplemental information for MATLAB:
Creating Modules describes how to create a GenePattern module that invokes the code that you have written. This section provides additional information that applies when you are directly calling the MATLAB executable from the GenePattern module:
On Windows, your GenePattern module definition form can contain a simple command line that calls MATLAB with the -r flag to execute your function; for example:
matlab -nosplash -r "analyzeThis <p1>
<p2>"
This example invokes MATLAB
without the splash screen (-nosplash)
and directs it to execute the quoted command, where p1 and p2 are parameters that you specify in the GenePattern
module definition form and that are passed to the MATLAB command line as
Strings. MATLAB looks for the function analyzeThis
on the MATLAB path; therefore, it is not necessary to upload the function as a
support file, although it is recommended.
To ensure that the GenePattern server can call the MATLAB executable, you typically add the MATLAB directory to your PATH system environment variable. (Alternatively, you can enter the full path to the MATLAB executable on the command line, but this makes it more difficult to deploy the module on other GenePattern servers.)
To check that MATLAB is on your path:
matlab and press
Enter.If the MATLAB application starts, MATLAB is on your path.
If MATLAB is not on your path, add it:
$MATLAB_ROOT/bin directory to the path.Open a new DOS window and check again that MATLAB is on your path.
On platforms other than Windows, the execution of the
command line differs slightly due to variations in the Java Virtual Machines
(VMs) that GenePattern is running. If you use the simple matlab command, as described for
Windows, the Java VMs on these platforms attempt to parse and quote the command
line resulting in MATLAB generating errors in its eval function.
On these platforms, you must use a wrapper Java class to launch MATLAB. This wrapper class also works on Windows and does not rely on the PATH variable, which makes it the preferred method for implementing the direct approach on any platform.
To use the wrapper Java class:
runmatlab.jar file as a support file. To
request a copy of this file, send e-mail to gp-help (at) broadinstitute.org; alternatively,
the java source code for the RunMatlab wrapper class is included here: RunMatlab.java.<java> -cp
<libdir>runmatlab.jar RunMatlab <libdir> analyzeThis <p1>
<p2>
Where analyzeThis
is the name of your MATLAB entry function name and <p1> and <p2>
are the arguments to the function. The RunMatlab
class ensures that the arguments are correctly written out and calls MATLAB
with the -nosplash and -nodisplay arguments.
Creating Modules describes how to create a GenePattern module that invokes the code that you have written. This section provides additional information that applies when you are compiling your M-code into a standalone executable and invoking that executable from the GenePattern module:
To run a standalone executable generated by the MATLAB Compiler, the GenePattern server must have the MATLAB Component Runtime (MCR) installed. This is a collection of shared libraries, which contains the runtime code for MATLAB, that is used by the standalone application. If the GenePattern server has MATLAB installed, you do not need to install the MCR; it is already installed.
Full details for installing the MCR can be found in the MATLAB Compiler documentation, in the section titled "Deployng Components to Other Machines". To summarize this documentation, on the GenePattern server machine, you need to run the MCRInstaller:
On Windows, to run the MCRInstaller:
<matlabroot>\toolbox\compiler\deploy\win32\MRCInstaller.exe to the server machine.MCRInstaller.exe.On Linux, to run the MCRInstaller:
buildmcr.<matlabroot>/toolbox/compiler/deploy/MCRInstaller.zip to the server machine.MCRInstaller.zip into a directory (<mcr_root>).setenv
LD_LIBRARY_PATH
<mcr_root>/runtime/glnx86:
<mcr_root>/sys/os/glnx86:
<mcr_root>/sys/java/jre/glnx86/jre1.4.2/lib/i386/client:
<mcr_root>/sys/java/jre/glnx86/jre1.4.2/lib/i386:
<mcr_root>/sys/opengl/lib/glnx86:${LD_LIBRARY_PATH}
On Mac OS X, to run the MCRInstaller:
buildmcr.<matlabroot>/toolbox/compiler/deploy/MCRInstaller.zip to the server machine.MCRInstaller.zip into a directory (<mcr_root>).setenv
DYLD_LIBRARY_PATH
<mcr_root>/<ver>/runtime/mac:
<mcr_root>/<ver>/sys/os/mac:
<mcr_root>/<ver>/bin/mac:
/System/Library/Frameworks/JavaVM.framework/JavaVM:
/System/Library/Frameworks/JavaEmbedding.framework/JavaEmbedding:
/System/Library/Frameworks/JavaVM.framework/Libraries
setenv XAPPLRESDIR <mcr_root>/<ver>/X11/app-defaults
When the MATLAB Compiler generates a standalone executable, it also generates a Component Framework (.ctf) file. The .ctf file must be on the path when you run the standalone executable. The easiest way to address this requirement is to create a launcher script (.bat or .sh file) that adds the .ctf file to the PATH or LIBPATH and then runs the standalone executable.
On Windows, for example, to launch the MATLAB
executable analyzeThis.exe,
create a launcher script, mllaunch.bat,
that contains the following lines:
set LIBDIR=%1
set PATH=%LIBDIR%;%PATH%
analyzeThis %2 %3
On Linux, for example,
to launch the MATLAB executable analyzeThis.exe,
create a launcher script, mllaunch.sh,
that contains the following lines:
#!/bin/csh
export MCR_ROOT=<path where you installed
the files from MCRInstaller.zip>
export
LD_LIBRARY_PATH=$1:$MCR_ROOT/runtime/glnx86:$MCR_ROOT/sys/os/glnx86:\
$MCR_ROOT/sys/java/jre/glnx86/jre1.4.2/lib/i386/client:\
$MCR_ROOT/sys/java/jre/glnx86/jre1.4.2/lib/i386:\
$MCR_ROOT/sys/opengl/lib/glnx86
export PATH=$1:$PATH
chmod a+x $1/analyzeThis
analyzeThis $2 $3
The chmod line sets the executable
permission on the executable file; by default, the GenePattern server does not
set this permission for uploaded files.
On Mac OS X, for
example, to launch the MATLAB executable analyzeThis.exe,
create a launcher script, mllaunch.sh,
that contains the following lines:
#!/bin/sh
export MCR_ROOT=/Volumes/os9/gpserv
export
LD_LIBRARY_PATH=$1:/Volumes/os9/matlab7.2/sys/os/mac:
/Volumes/os9/matlab7.2/bin/mac/
export DYLD_LIBRARY_PATH=$LD_LIBRARY_PATH
export PATH=$1:$PATH
chmod a+x $1/writeToFile
writeToFile $2 "$3"
The chmod line sets the executable
permission on the executable file; by default, the GenePattern server does not
set this permission for uploaded files.
On the GenePattern module definition form, write a command line calls the launcher script, passing the <libdir> parameter as the first argument (so that it can be added to the path).
On Windows, the following command line calls the launcher
script, mllaunch.bat:
<libdir>mllaunch.bat <libdir> <param1>
<param2>
On Linux or Mac OS X, the following command
line calls the launcher script, mllaunch.sh:
sh <libdir>mllaunch.sh <libdir>
<param1> <param2>
In both command lines, the
first <libdir> sets the path to the mllaunch
script. The second <libdir> is passed as the first argument to the script
so that the script can add this directory to the appropriate environment
variables. The <param1> and <param2> variables are parameters to
the MATLAB application, which you define in the module definition form and
specify in the command line as usual.
For the compiled approach, you must specify at least two support files for the MATLAB application: the executable file and .ctf file. If your application requires additional files for its execution, also add those files as support files.
Should you choose to distribute your MATLAB based module to others, you must ensure you are in compliance with the MATLAB licensing agreement:
http://www.mathworks.com/company/aboutus/policies_statements/agreement.pdf
Following are a few key points for GenePattern developers:
Please refer to the MATLAB licensing agreement for exact details. You are responsible for reviewing and complying with the MATLAB software license. The above summary does not exempt you from this responsibility.
This section provides a step-by-step example of deploying a simple M-file application as a GenePattern module on a GenePattern server. Where the instructions are platform specific, the example shows instructions for Windows, Linux, and Mac OS X.
The first step is writing the MATLAB M-file that you want to share. For this example, write a simple application that takes a filename and a String and writes the String out to a file with the given name. This application consists of the following lines:
% write the variable whatToWrite to a file called
filename in the current directory
fid = fopen(filename,'w');
fprintf(fid,'#writing to a file\n\n');
fprintf(fid,whatToWrite);
fclose(fid);
To call the M-file from the command line and pass it parameters, you must turn this script into a no-return function. To do this, add a function definition line at the start of the M-file and save the file using the name of the function (for example, writeToFile.m).
function writeToFile( filename, whatToWrite)
% write the parameter whatToWrite to a file
called filename in the current directory
fid = fopen(filename,'w');
fprintf(fid,'#writing to a file\n\n');
fprintf(fid,whatToWrite);
fclose(fid);
Within the MATLAB environment, call the MATLAB Compiler to convert this function into an application:
>> mcc -m writeToFile
Within the current working directory, this creates a number of files, including the following:
Note: To use the MATLAB compiler on Mac OS X, you must have Xcode 2.2 installed; minimally, the Developer Tools, gcc 4.0, gcc 3.3, Mac OS X SDK, and BSD SDK. These instructions were tested using Xcode 2.2.1.
Install the MATLAB Component Runtime (MCR) on the GenePattern server, if you have not done so already. If the GenePattern server has MATLAB installed, it also has the MCR installed.
To install the MCR:
<matlabroot>\toolbox\compiler\deploy\win32\MRCInstaller.exe to the GenePattern server machine. MCRInstaller.exe
To install the MCR:
>> buildmcr
mcrdir
This creates a directory, mcrdir, beneath the current working
directory and creates a file within that directory called MCRInstaller.zip.matlab, under the GenePattern server
directory and install the library files in MCRInstaller.zip into that directory:cd GenePatternServer
mkdir matlab
cd matlab
cp <path to mcrinstaller.zip>MCRInstaller.zip .
unzip MCRInstaller.zipCreate the launcher script that sets the environment variables and then calls the MATLAB application.
Create the launcher script as a
batch file that sets the PATH variable for the environment and then calls the
MATLAB application. To do so, in a text editor, create the following mllaunch.bat file:
set LIBDIR=%1
set PATH=%LIBDIR%;%PATH%
writeToFile %2 %3
Create the launcher script as
an .sh file that sets the PATH and LD_LIBRARY_PATH variables for the
environment, ensures that the application is executable, and then calls the
MATLAB application. To do so, in a text editor, create the following mllaunch.sh file:
#!/bin/csh
export
MCRROOT=/home/username/GenePatternServer/matlab/v70
export
LD_LIBRARY_PATH=$1:$MCRROOT/runtime/glnx86:$MCRROOT/sys/os/glnx86:$MCRROOT/sys/java/jre/glnx86/jre1.4.2/lib/i386/client:$MCRROOT/sys/java/jre/glnx86/jre1.4.2/lib/i386:$MCRROOT/sys/opengl/lib/glnx86
export PATH=$1:$PATH
chmod a+x $1/testTwo
writeToFile $2 $3
Note that the MCR_ROOT variable
is set to the v70 directory,
which you created by unzipping MCRInstaller.zip.
Create the launcher script as
an .sh file that sets the LD_LIBRARY_PATH and DYLD_LIBRARY_PATH variables for
the environment, ensures that the application is executable, and then calls the
MATLAB application. To do so, in a text editor, create the following mllaunch.sh file:
#!/bin/sh
export MCR_ROOT=/Volumes/os9/gpserv
export
LD_LIBRARY_PATH=$1:/Volumes/os9/matlab7.2/sys/os/mac:/Volumes/os9/matlab7.2/bin/mac/
export DYLD_LIBRARY_PATH=$LD_LIBRARY_PATH
export PATH=$1:$PATH
chmod a+x $1/writeToFile
writeToFile $2 "$3"
Use GenePattern to create a module that executes the launcher script.
sh
<libdir>mllaunch.bat <libdir> <fname> <txt>sh
<libdir>mllaunch.sh <libdir> <fname> <txt>
Save the module and execute it. The module should create two files:
If the following error appears in the stdout file, you have
not correctly set the path to the libraries that you installed from MCRInstaller.zip:
error while loading shared libraries: libmwmclmcrrt.so.7.0: cannot open shared object file: No such file or directory
Double check the path. If it is
correct, you may be using a different Unix shell than the one used in this
example. Check that the mllaunch.sh
file uses the correct command (export
in this example) to set PATH and LD_LIBRARY_PATH.
Using Java as a GenePattern client allows you to run GenePattern modules and visualizers from within a Java application. This section describes how you can use the GenePattern Java library to run GenePattern analyses as easily as calling a routine. It contains the following topics:
If you are not familiar with Java, see the http://java.sun.com website, which provides downloadable programs, samples, tutorials, and book suggestions.
The GenePattern Java library allows you to invoke a GenePattern module as if it were a local Java method running on your client and to get back from the module a list of result files. A zip file containing the Java library (and Javadoc that describes the API for accessing the server and running modules) is available on your GenePattern server.
To download the GenePattern Java library to your computer:
This section explores a simple Java application that preprocesses a dataset and displays it using the HeatMapViewer. The included code can be copied and pasted into your Java program so that you can try it out, modify it, and create your own solutions. The full source code of the sample application is available here.
The first statements in the application initialize various settings, which you must do once in every application that accesses GenePattern. You will need to customize the italicized GenePattern server URL and GenePattern user name (typically, your e-mail address) with values appropriate for your GenePattern server.
import org.genepattern.matrix.Dataset;import org.genepattern.client.GPClient;import org.genepattern.webservice.JobResult;import org.genepattern.webservice.Parameter;import org.genepattern.io.IOUtil;import java.io.File;public class MyProgram {
public static void main(String[] args)
throws Exception {
GPClient gpClient=new GPClient("http://localhost:8080",
"your email address");
After initializing the required settings, the application
runs the PreprocessDataset module to preprocess a dataset. This example
references the dataset using a publicly-accessible URL, but a filename would be
equally valid. When you invoke the runAnalysis
method, the GenePattern library invokes the appropriate module on the server,
passing all of the input parameters and input files. Control returns to your
application when the module completes. (To run a module asynchronously, invoke
the runAnalysisNoWait method
or use the runAnalysis
method in a separate thread.)
String inputDataset= "ftp://ftp.broadinstitute.org/pub/genepattern/all_aml/all_aml_train.res";JobResult preprocess=gpClient.runAnalysis("PreprocessDataset",
new Parameter[] {
new Parameter("input.filename", inputDataset)
});
When the module completes, you can query the JobResult object for an array of
filenames that are the output from the module. You can download the result
files or leave them on the server and refer to them by URL. Referring to result
files by URL is especially useful for intermediate results. In this example,
the JobResult object named preprocess contains a list of filenames
(of length 1, in this case), which the application displays in a heat map:
// view results in a HeatMapViewer visualizergpClient.runVisualizer("HeatMapViewer",
new Parameter[] {
new Parameter("dataset", preprocess.getURL(0).toString())
});
The last statements in the application download the preprocessed data and load it into a matrix for further analysis:
String downloadDirName=String.valueOf(preprocess.getJobNumber());
// download result files File[] outputFiles = preprocess.downloadFiles(downloadDirName); // load data into matrix for further manipulation Dataset dataset=IOUtil.readDataset(outputFiles[0].getPath());
}
}
You can combine GenePattern analyses with any capabilities that the Java environment has to offer. Use Java's 2-D and 3-D graphics libraries to create graphic output, or summarize and report on the data using your own code. The basic idea to remember is that GenePattern modules create result files and those files are available to the Java application for processing.
For more information:
1. Select a module (or pipeline). GenePattern displays the parameters for the module (pipeline).
2. Optionally, enter the parameter values that you want to use.
3. Use the View Code or Generate Code field (at the bottom of the form) to display the Java code required to execute this module/pipeline with these parameters.
Life Science Identifiers (LSIDs) can be used instead of
module names to identify modules for GenePattern to run. An LSID may be
submitted in place of the module name in the methods runAnalysis and runVisualizer. When an LSID is provided
that does not include a version, the latest available version of the module
identified by the LSID will be used. If a module name is supplied, the latest
version of the module with the nearest authority is selected. The nearest
authority is the first match in the sequence: local authority, Broad authority,
other authority.
If you are unfamiliar with LSIDs and GenePattern versioning, see the Concepts Guide.
Using MATLAB as a GenePattern client allows you to run GenePattern modules and to manipulate and visualize the results in a powerful, commercial technical computing application that works on most major platforms. Using GenePattern allows you to invoke methods written in many other languages without having to worry about how to launch them. This section describes how you can use the GenePattern MATLAB library to run GenePattern analyses:
Resources and documentation for MATLAB are available at http://www.mathworks.com/.
The GenePattern MATLAB library allows you to invoke a GenePattern module as if it were a local MATLAB function running on your client and to get back from the module a list of result files. A zip file containing the MATLAB library is available on your GenePattern server.
To download the GenePattern MATLAB library to your computer:
MATLAB7/toolboxes directory. If you do
not have permission to put files in that directory, unzip into any other
directory. >>pathtoolGenePatternServer
and GenePatternFileSupport
directories, with subfolders, to the MATLAB search path.Note: MATLAB 7.0.4 (R14SP2) and later use Java Virtual Machine (JVM) 1.5. If you are using an earlier version of MATLAB, you must change the JVM that MATLAB is using to JVM 1.5. For instructions, see http://www.mathworks.com/support/solutions/data/1-1812J.html?solution=1-1812J.
This section explores a simple MATLAB program that runs a module, displays the resulting output, and loads it into a MATLAB matrix for further analysis. The included code can be copied and pasted into your MATLAB client so that you can try it out, modify it, and create your own solutions.
The first statements in the application initialize various settings, which you must do once in every application that accesses GenePattern. You will need to customize the italicized GenePattern server URL, GenePattern user name (typically, your e-mail address) and password (if required) with values appropriate for your GenePattern server.
% Create a GenePattern server proxy instance
gp =
GenePatternServer('http://localhost:8080','my.email@my.domain', 'mypassword');
After initializing the required
settings, the application runs the TransposeDataset module to transpose a
dataset. This example references the dataset using a publicly-accessible URL,
but a filename would be equally valid. As shown below, you can call the GenePattern
methods directly or by calling the runAnalysis
method. When you call a GenePattern method, such as TransposeDataset, the GenePattern
library invokes the module on the server, passing all of the input parameters
and input files. Control returns to your application when the module completes.
(To run a module asynchronously, invoke the method in a separate thread.)
% input dataset for transpose operation
params.output_file_name = 'transposed.out'
params.input_filename='http://www.broadinstitute.org/mpr/publications/projects/Leukemia/
ALL_vs_AML_train_set_38_sorted.res'
% transpose the dataset
transposeResult = gp.TransposeDataset(params)
% alternate call to transpose the dataset
transposeResult = runAnalysis(gp,
'TransposeDataset', params)
When the module completes, it returns a MATLAB structure
that contains a list of filenames that are the output from the module. In this
example, transposeResult is
a structure with a list of filenames (of length 1, in this case). The
application displays the results in a file viewer window and also loads them
into a matrix so that further manipulation can be performed:
% display the transposed results
edit 'transposed.out.gct'
% now read the output into a matrix
% so we can do further manipulation in MATLAB
myData = loadGenePatternExpressionFile('transposed.out.gct')
You can combine GenePattern analyses with all of the rich functionality of MATLAB. For example, you can use MATLAB's plotting methods to create graphic output, save modified matrices to files using save, or summarize and report on the data using your own code. The basic idea to remember is that GenePattern modules create result files and those files are available to the MATLAB client for processing.
For a list of the GenePattern modules available on your server,
run the listMethods function
on your GenePatternServer
object. To view the names of the input parameters for a module, use the describeMethod function on your GenePatternServer object, passing it the
module name.
% display the available GenePattern modules
listMethods(gp)
% now look at the parameters for the
TransposeDataset module
describeMethod(gp, 'TransposeDataset')
Alternatively, to get the parameters with their default
values filled in, use the getMethodParameters
function of the GenePatternServer
object. This returns a MATLAB structure with named elements for each parameter,
filled in with the default value if one exists. After filling in the missing
parameters and overriding defaults if desired, this structure can then be
passed on to the runAnalysis
method.
% display the available GenePattern modules
params2 = getMethodParameters(gp,
'TransposeDataset')
params2.input_filename='http://www.broadinstitute.org/mpr/publications/projects/Leukemia/ALL_vs_AML_train_set_38_sorted.res'
% transpose the dataset
transposeResult = gp.TransposeDataset(params2)
The GenePattern MATLAB library also has convenience methods to read and write GenePattern files (such as res, gct, and odf files). Even if you choose not to look in the library, you can extend the techniques shown above to implement your own analyses.
For more information:
1. Select a module (or pipeline). GenePattern displays the parameters for the module (pipeline).
2. Optionally, enter the parameter values that you want to use.
3. Use the View Code or Generate Code field (at the bottom of the form) to display the MATLAB code required to execute this module/pipeline with these parameters.
You can use Life Science Identifiers (LSIDs) to identify a
module when executing GenePattern code in MATLAB. An LSID may be submitted in
place of the module name to getMethodParameters
or runAnalysis. When
providing an LSID to a method in addition to a module name, the LSID alone is
used to determine what module to run. When an LSID is provided that does not
include a version, the latest available version of the module identified by the
LSID will be used. If you are unfamiliar with LSIDs and GenePattern versioning,
see the Concepts Guide.
% Example using LSIDs from MATLAB
params = getMethodParameters(gp,
'urn:lsid::broadinstitute.org:cancer.software.genepattern.
module.analysis:00026:0');
params.output_file_name = 'transposed.out'
params.input_filename='http://www.broadinstitute.org/mpr/publications/projects/Leukemia/ALL_vs_AML_train_set_38_sorted.res'
% transpose the dataset
transposeResult = runAnalysis(gp,
'urn:lsid::broadinstitute.org:cancer.software.genepattern.module.analysis:00026:0',
params)
Using R as a GenePattern client allows you to run GenePattern modules and to manipulate and visualize the results in a powerful, free statistical desktop package that works on most major platforms. Using GenePattern allows you to invoke methods written in many other languages without having to worry about how to launch them or whether you are passing incorrect parameters. This section describes how you can use the GenePattern R library to run GenePattern analyses:
If you are not familiar with R, see the following resources on the www.r-project.org website:
The GenePattern R package allows you to invoke a GenePattern module as if it were a local R method running on your client and to get back from the module a list of result files. The package requires R version 2.4.1 or greater and the rJava package. The package can be downloaded from your GenePattern server in Windows(.zip), source (.tar.gz), and Mac OS X (.tgz) formats.
To download the GenePattern R package to your computer:
install.packages("full-path-to-GenePattern-R-package", repos=NULL)
Note: If you are using a version of R which you cannot modify (because it is a publicly-shared version and you do not have appropriate privilege), you can have it load the GenePattern library by setting the environment variable R_LIBS=<GenePattern install directory>/R/library in your autoexec.bat, .cshrc, .bashrc or other shell startup file. R will then load from its usual location, but will also search for and find the GenePattern library from your installation.
This section explores a simple R program that runs a module, displays the resulting output, and loads it into an R matrix for further analysis. The included code can be copied and pasted into your R environment so that you can try it out, modify it, and create your own solutions.
The first statements in the application initialize various settings, which you must do once in every application that accesses GenePattern. You will need to customize the italicized GenePattern server URL, GenePattern user name (typically, your e-mail address), and password with values appropriate for your GenePattern server. The gp.login method returns a GPClient object that contains the information required for running modules on a GenePattern server.
# Load GenePattern package
library(GenePattern)
username <- "your email address"
password <- "your
password"
servername <- "http://localhost:8080"
# Obtain a GPClient object which references a
specific server and user
gp.client <- gp.login(servername, username, password)
After initializing the required
settings, the application runs the PreprocessDataset module to preprocess a
dataset. This example references the dataset using a publicly-accessible URL,
but a filename would be equally valid. When you call an R method, such as run.analysis, the GenePattern package
invokes the appropriate module on the server, passing all of the input parameters
and input files. Control returns to your application when the module completes.
(To run a module asynchronously, use the method runAnalysisNoWait.)
# input dataset for preprocess operation
input.ds <- "ftp://ftp.broadinstitute.org/pub/genepattern/all_aml/all_aml_train.res"
# preprocess the dataset
preprocess.jobresult <- run.analysis(gp.client,
"PreprocessDataset",
input.filename=input.ds)
When the module completes, it returns a JobResult object with which you can execute various methods. For example, you can call a method using a JobResult object to get an R list of the filenames that are the output of the module. Afterwards, you can download the files or leave them on the server and refer to them by URL. In this example, we view the results in a heat map:
# Obtain the url location of the result and run the visualizer
preprocess.out.file.url <- job.result.get.url(preprocess.jobresult, 0)
run.visualizer(gp.client, "HeatMapViewer",
dataset=preprocess.out.file.url)
In this example, the application downloads the result file and displays the results in a file viewer window, then also loads the data into a matrix so that further manipulation can be performed in R:
# download result files
download.directory <-
job.result.get.job.number(preprocess.jobresult)
download.directory <-
as.character(download.directory)
preprocess.out.files <- job.result.download.files(preprocess.jobresult,
download.directory)
# display the preprocessed result
preprocessed.out.file <-
as.character(preprocess.out.files[1])
file.show(preprocessed.out.file)
# now read the output into a matrix
# so we can do further manipulation in R
data <- read.dataset(preprocessed.out.file)
You can combine GenePattern analyses with all of the rich statistical functionality of R. For example, you can use R's plot and legend methods to create graphic output, output JPEGs of your visualized data using savePlot, save modified matrices to files using save, or summarize and report on the data using your own code. Just remember: GenePattern modules create JobResult objects and those objects are available to the R client for processing.
The GenePattern R package also has methods to read and write GenePattern files (such as res, gct, and cls files), to enable running of multiple modules in parallel, to run modules with input from files that were output from previous modules without moving them from the server, and other utilities. Even if you choose not to look in the library, you can extend the techniques shown above to implement your own analyses.
For more information:
1. Select a module (or pipeline). GenePattern displays the parameters for the module (pipeline).
2. Optionally, enter the parameter values that you want to use.
3. Use the View Code or Generate Code field (at the bottom of the form) to display the R code required to execute this module/pipeline with these parameters.
You can use Life Science Identifiers (LSIDs) instead of module names to identify modules for GenePattern to run. For R, this is primarily useful when you want to specify a particular version of a module for GenePattern to run. The easiest way to specify a particular version of a module is to specify the LSID as an argument to an R method such as run.analysis in place of the GenePattern module name. For example, the following statement invokes version 1 rather than the latest version of the PreprocesDataset module:
preprocess.jobresult <- run.analysis(gp.server, "urn:lsid:broadinstitute.org:cancer.software.genepattern.module.analysis:00020:1", input.filename=input.ds)
If you are unfamiliar with LSIDs and GenePattern versioning, see the Concepts Guide.
|
Version |
Release date |
Comments |
|
3.2.1 |
November 2009 |
Minor updates to Running a MATLAB Program and Accessing GenePattern from R. |
|
3.1 |
December 2007 |
Updated Using GenePattern from Java and Using GenePattern from R. |
|
3.0 |
May 11, 2007 |
Updated Using GenePattern from R. |
|
3.0 |
April 2007 |
GenePattern 3.0 Release |


