Administrators Guide

About the Administrators Guide

You can use the GenePattern public server hosted at the Broad Institute, install a local GenePattern server for your own use, or install a networked GenePattern server to be used by several people. Concepts explains the benefits of each approach.

GenePattern can be run standalone on a small machine or separated into its client and server components to take advantage of a more powerful compute server. When you install a GenePattern server, you set basic server configuration options. If you are installing a local GenePattern server for your own use, you generally do not need to modify the server configuration. If you are the server administrator for a networked GenePattern server, you generally want to modify several of the GenePattern configuration options described in this guide.


Creating Groups and Administrators

Note: Only the GenePattern team can create groups on the GenePattern public server. To create a group, you must have installed a local GenePattern server (see Starting Your Own GenePattern Server).

The GenePattern configuration file GenePatternServer/resources/userGroups.xml defines groups and group membership. The Users and Groups server settings page lists all registered users and the groups to which they belong.

To create or modify groups, edit the userGroups.xml file. The XML syntax is simple but must be followed carefully. The rules are as follows:

Creating an Administrators Group

As shown below, the default userGroups.xml file defines one group, administrators, which includes all GenePattern users. Members of the administrators group have full access to the GenePattern server and all jobs run on the server. Because all users are administrators, the default GenePattern installation has no concept of “private” data.

<!-- map of users to groups -->
<userGroups>
<group name="administrators">
         <user name="*"/>
</group>
</userGroups>

To maximize data privacy, minimize the number of users in the administrators group. For example, add exactly one person to the administrators group and only that one administrator can view all jobs run on the server. Other users can view their own jobs and jobs that have been explicitly shared.

<!-- map of users to groups -->
<userGroups>
<group name="administrators">
         <user name="jsmith"/>
</group>
</userGroups>

Creating Other Groups

To create a new group, add a <group> element to the userGroups.xml file. The following edited userGroups.xml file adds a second user to the administrators group and creates a new group, mjones_lab:

<!-- map of users to groups -->
<userGroups>
<group name="administrators">
         <user name="jsmith"/>
         <user name="mjones"/>
</group>
<group name="mjones_lab">
         <user name="mjones"/>
         <user name="jdoe"/>
         <user name="sfederan"/>
</group>
</userGroups>

Renaming a group does not update shared analysis results. Members of a group can share analysis results. If you rename a group, from old_name to new_name for example, the users in the old_name group are now in the new_name group. Analysis results that they shared however were shared with the old_name group. Each user who shared job results with the old_name group should edit the share options for the job and share the job results with the new_name group.


Modifying Server Settings

To modify the configuration of your GenePattern server, use the Server Settings page:

  1. Click Administration>Server Settings to display the Server Settings page.
  2. From the Server Settings pane, select the server setting that you want to modify. GenePattern displays a page of related server configuration options.
  3. Modify and save the server configuration options.
  4. Optionally, return to step 2 to change additional settings.

The following table summarizes the server settings. For more detail, click a link in the table.

Access

Specify which clients have access to the server.

Advanced

Specify software source directories and other low-level configuration options.

Command Line Prefix

Specify commands and qualifiers to be prepended to the command line used to invoke a module or pipeline.

Custom

Create new server configuration options.

Database

Specify configuration options for the GenePattern database.

File Purge

Specify how long files remain on the server before being deleted.

GenePattern Log

Display the log file for the GenePattern server.

Job Configuration

Work with a job queue that you have configured for use with a queuing system, such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE).

Programming Languages

Specify the root directories for the programming languages used by GenePattern and the Java flags to be added to Java command lines executed by the server.

Proxy

If your organization has a web proxy between the GenePattern server and the internet, specify the proxy information required to access the internet.

Repositories

Specify the URL used to access the module repository and the suite repository.

Shut Down Server

Shut down the GenePattern server.

System Message

Broadcast a message to all users logged into the GenePattern server.

Task Info

Display the LSID of each module and pipeline installed on the GenePattern server.

Uploaded Files

Display the account information and uploaded files of a selected user.

Users and Groups

Display account information for all users, including the groups to which they belong.

Web Server Log

Display the log file for the web server used by the GenePattern server.

Access

Use the Access page to define which GenePattern clients have access to the GenePattern server. The localhost (127.0.0.1) computer cannot be denied access to the locally installed GenePattern server. This prevents you from inadvertently denying yourself access to the server.

Using the Access page to control which computers have access to the GenePattern server is the simplest way to secure your server. You can also control access to your server based on user authentication and user permissions, as described in Securing the Server. The Access page filters are applied before any user-specific authentication or permissions are checked. If your computer cannot access the server, you cannot access the server regardless of your username/password or permissions.

Access

Click Save to save your changes. Click Restore to return to the value set at installation.

Back to top

Advanced

The Advanced page contains directory specifications for the GenePattern source files and other low-level configuration options. You rarely need to modify these options.

Advanced Configurations

Click Save to save your changes. Click Restore to return to the values set at installation.

Back to top

Command Line Prefix

The Command Line Prefix page allows you to prepend text to the command line used to execute a module. You can prepend the same text to all module command lines or prepend text for a specific module.

Note: Prior to GenePattern 3.2.3 (June 2010), administrators used the command line prefix for connecting to an external queuing system. GenePattern now provides the CommandExecutor interface for that purpose. For more information, see Using a Queuing System.

Command Line Prefix

To prepend text to all (or most) command lines executed by the GenePattern server:

  1. Enter the desired commands and qualifiers in the Default Command Prefix field.
  2. Click save default. GenePattern displays the updated content of the default prefix. The name/content table in the middle of the form lists the default prefix and its content. The previous illustration shows the default prefix with no content.
    When GenePattern executes a module or pipeline, it constructs the appropriate command line, prepends the default prefix to that command line, and then executes the command line.

To prepend text only to command lines that invoke specific modules or pipelines:

  1. In the Add New Prefix field, enter a name for the prefix and the commands and qualifiers to prepend to the command line.
  2. Click add prefix. GenePattern creates the new prefix, updates its content, and adds the prefix to the name/content table in the middle of the form.
  3. At the bottom of the form, select one or more module(s)/pipeline(s), select your new prefix, and click add mapping. GenePattern adds the prefix information to the module/command prefix name table.
    When GenePattern executes a module or pipeline listed in the module/command prefix name table, it constructs the appropriate command line, prepends the specified prefix to that command line, and then executes the command line. (When GenePattern executes a module or pipeline not listed in that table, it constructs the appropriate command line, prepends the default prefix to that command line, and then executes the command line.)

Back to top

Custom

Use the Custom page to define your own configuration options.

When you create a module, the custom configuration options are available as substitution variables in the module command line. For example, if you define a custom property "foo", you can use <foo> in the command line to pass the value of the custom configuration option to your module. In the Broad repository, for example, the LandmarkMatching and PeakMatch modules use the custom configuration option pepperPrefix. For more information, see Creating Modules in GenePattern.

Custom Settings

  1. In the name field, enter a name for the configuration option.
  2. In the content field, enter a value for the configuration option.
  3. Click add setting. GenePattern adds the option to the table at the bottom of the form.

Back to top

Database

Use the Database Parameters page to set configuration options for the GenePattern database. The following figure shows the HSQL options. You rarely need to change these options.

For information about modifying the database, see Changing the GenePattern Database (HSQL to Oracle).

Database Parameters

Click Save to save your changes. Click Restore to return to the value set at installation.

Back to top

File Purge

Use the File Purge page to specify when analysis result files are deleted from the server:

File Purge Settings

Click Save to save your changes. Click Restore to return to the values set at installation.

Back to top

GenePattern Log

Use the GenePattern Log page to view warnings and messages generated by the GenePattern server. (Use the Web Server Log page to view messages generated by the web server that GenePattern uses.)

GenePattern Log

Back to top

Job Configuration

If you have configured your GenePattern system to work with a queuing system, such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE), the Job Configuration page helps you control the queue and reload your configuration files. For more information, see Using a Queuing System.

Use the Job Configuration section to control the GenePattern internal job queue:

Use the Command Executors section to identify each of the command executors currently installed on the GenePattern server.

Use the Configuration File section to identify and review the .yaml configuration file currently active on the GenePattern server.

Back to top

Programming Languages

The Programming Languages page contains two sections. After making changes, click Save to save them or Restore to return to the value set at installation.

Use Programming Language Configurations to specify the root directories for the programming languages used by GenePattern:

Programming Language Configurations

When you install GenePattern, you install the programming languages used by GenePattern. If you have alternate programming language installations that you prefer to use, use this page to point to those installations. If you would like to use more recent versions of R, see Using Different Versions of R.

Use Programming Language Options to increase the memory allocated to modules written in Java and R:

Programming Language Options

You can also increase the amount of memory allocated to the GenePattern server or client. For more information, see Increasing Memory Allocation.

Back to top

Proxy

If your server is behind a firewall, use the Proxy page to set the HTTP and FTP Proxy information. Without the proxy information, the server cannot download modules, pipelines, or suites from the repository maintained by the Broad Institute. If you do not know the proxy information, contact your systems administrator.

Proxy Settings

Click Save to save your changes. Click Restore to return to the values set at installation.

Back to top

Repositories

Use the Repositories page to identify the location of the repository to be accessed by the GenePattern server when you install modules and pipelines or suites from the repository. By default, it points to the module repository maintained by the Broad Institute. For information about implementing a module repository at your site, see the In-Depth Article Setting Up a Module Repository.

Click Save to save your changes. Click Restore to return to the values set at installation. Click Remove to delete the selected URL from the list.

Back to top

Shut Down Server

You can shut down the GenePattern server by clicking the link on this page. Alternatively, double-click the Stop GenePattern Server icon on your desktop.

Shut Down Server

Back to top

System Message

Use the System Message page to broadcast a message to all users logged into the GenePattern server. The message text that you enter can include simple HTML formatting commands, such as <b> and <em>.

Back to top

Task Info

The Task Info page lists every module and pipeline installed on the GenePattern server. It can be useful in sorting out the confusion that can occur when modules and pipelines share the same name.

The Clear TaskInfo Cache link clears an internal GenePattern cache, which can be useful for GenePattern development. Clicking the link has no visible impact on GenePattern operations.

Back to top

Uploaded Files

The Uploaded Files page displays basic information about a user and their uploaded files (see Uploading Files). By default, the page displays information about the user logged into the GenePattern web client. To view information for another user, enter their username and click Select User.

If a user manually adds or removes files from the Uploads directory on the file server:

  1. Enter their username and click Select User to display their information.
  2. Click Resync Files to update (resynchronize) their uploaded files based on their Uploads directory on the file server.
  3. Enter their username and click Select User to display the updated information.

This allows you to synchronize the user interface with the modified uploads directory on the file server without restarting the GenePattern server.

Back to top

Users and Groups

Use the Users and Groups page to view user account information, including the groups to which a user belongs. This page shows only registered users. An administrator can add users to a group (Creating Groups and Administrators) before they register, but the users are not listed on this page until they have created a GenePattern account by clicking the Registration link on the GenePattern login page. If you update the userGroups.xml file, click Reload Users and Groups to update (resynchronize) the GenePattern web interface. This allows you to update users and groups without restarting the GenePattern server.

When you start the GenePattern server, the server populates the Uploads tab for each user by reviewing the Uploads directory on the file server. Typically, users add and remove uploaded files from the GenePattern web client interface. If a user adds or removes files from the Uploads directory on the file server, enter their username and click Resync Uploads to update (resynchronize) their Uploads tab based on their Uploads directory on the file server. This allows you to synchronize the user interface with the modified uploads directory on the file server without restarting the GenePattern server.

Back to top

Web Server Log

Use the Web Server Log page to view messages generated by the web server that GenePattern uses. (Use the GenePattern Log page to view warnings and messages generated by the GenePattern server.)

Web Server Log

Back to top


Setting the Java Version

As of Release 3.2.1, the GenePattern server can be configured to run under Java 5 or Java 6.

Mac OS X 10.6 (Snow Leopard) or Mac OS X 10.7 (Lion)

When installed on Mac OS X 10.6 (Snow Leopard) or Mac OS X 10.7 (Lion), the GenePattern server is automatically configured for Java 6.

Mac OS X 10.5 (Leopard)

When installed on Mac OS X 10.5 (Leopard), the GenePattern server is configured for Java 5 by default.

To configure the GenePattern server for Java 6:

  1. Confirm that Java 6 is installed.
    Tip: Use the 'java -version' command.
  2. Stop the GenePattern server.
  3. Set your Java Preferences to use Java 6.
    Tip: Java Preferences are usually found under Application > Utilities.
  4. Right-click StartGenePatternServer.app and Show Package Contents.
  5. Edit the Info.plist file and set the JVMVersion to 1.6.
  6. Restart the GenePattern server.

Windows

When installed on Windows, the GenePattern server is configured for Java 5 by default.

To configure the GenePattern server for Java 6:

  1. Confirm that Java 6 is installed.
    Tip: Use the 'java -version' command.
  2. Stop the GenePattern server.
  3. Edit the StartGenePatternServer.lax file and update the location of the Java executable:
    # LAX.NL.CURRENT.VM
    # -----------------
    # the VM to use for the next launch
    lax.nl.current.vm=\jre\bin\java.exe
  4. Edit the StopGenePatternServer.lax file and update the location of the Java executable.
  5. Restart the GenePattern server.

Linux

When installed on Linux, the GenePattern server is configured for Java 5 by default.

To configure the GenePattern server for Java 6:

  1. Confirm that Java 6 is installed.
    Tip: Use the 'java -version' command.
  2. Stop the GenePattern server.
  3. Edit the StartGenePatternServer.lax file and update the location of the Java executable:
    # LAX.NL.CURRENT.VM
    # -----------------
    # the VM to use for the next launch
    lax.nl.current.vm=/tools/pkgs/jdk_1.6.0_12/bin/java
  4. Edit the StopGenePatternServer.lax file and update the location of the Java executable.
  5. Restart the GenePattern server.

Using Different Versions of R

Installing GenePattern (version 3.1 and later) installs R 2.5.

Most of the GenePattern modules available in the Broad Institute repository (Modules & Pipelines>Install from repository) work with R 2.5. However, some GenePattern modules require different versions of R; for example, ComBat v2 requires R 2.7. Unfortunately, R is not backward compatible. If you simply install and run the latest version of R, modules may fail or (worse) may produce invalid results even though they do not fail.  Instead, you must have multiple versions of R installed in order to run all of these modules together on the same server.

Defining R in GenePattern

In GenePattern, each module definition includes a command line that runs the analysis program. For an R module, the R version is defined by a command line substitution parameter. For example, the <R> parameter is substituted with the full path to the R 2.0.1 executable. The <R2.5> parameter is substituted with the full path to the R 2.5 executable. Similar parameters are used for other versions of R.

GenePattern version 3.1 and later installs R 2.5 and sets the <R2.5> parameter. If you upgraded from GenePattern 3.0, your GenePattern installation also includes R 2.0.1 and sets the <R> parameter.

Newer versions of GenePattern also set the <R2.5_HOME> parameter, pointing to the location of the R 2.5 installation such that Rscript is found at <R2.5_HOME>/bin/Rscript.  The GenePattern team is phasing out the use of <R2.5> in favor of <R2.5_HOME> in future module revisions.

Adding More Recent Versions of R to GenePattern

To add a different version of R to your GenePattern installation (for example R 2.7 on Mac OS X, for ComBat v. 2):

  1. Install the required version of R, if necessary. This is covered in detail on the R Project home page.  For Mac users, please read the section on 'Using Multiple Versions of R on Mac OS X' below.
    1. Go to http://www.r-project.org/ and select a CRAN mirror.
    2. Locate either the source code or the binary for the desired version of R. For a binary installation, look for the subdirectory link labeled 'old', which is towards the bottom of the page.  The next section lists the locations of various versions of R used by GenePattern.
    3. Follow the installation instructions.
  2. After you install the correct version of R, in whatever manner makes sense to you, you need to configure GenePattern to use that version of R. This is as simple as adding two new substitution parameters to the server settings.
    1. Click Administration>Server Settings and go to the Custom page.
    2. Add a setting for R*_HOME and another for R*, replacing the '*' with an actual version number.
      • For R 2.7 on Mac OS X the parameters are:
        R2.7_HOME=/Library/Frameworks/R.framework/Versions/2.7/Resources
        R2.7=<java> -DR_suppress=<R.suppress.messages.file> -DR_HOME=<R2.7_HOME> -Dr_flags=<r_flags> -cp <run_r_path> RunR
      • For other platforms, set R2.7_HOME equal to the full path to the installation directory. It must be a directory which contains a 'bin' folder, which contains the 'R' executable.
    3. For modules requiring R 2.15, it is only necessary to set the <R2.15_HOME> parameter.  Setting the <R2.15> parameter is not required as its use is discouraged.

Where to Find Older Versions of R

CRAN makes older versions of R available through its archives.  Archived binary releases are available here for Mac and here for Windows.  Older binary releases are not available for Linux and other platforms.  Instead, it is necessary to build from the archived source bundles.  In particular, here are direct links to the versions of R required by modules provided in the GenePattern public and beta repositories:

There are a number of CRAN mirrors as well.

Using Multiple Versions of R on Mac OS X

While it is possible to run multiple versions of R on a Mac, doing so requires more care and effort than on Windows or Linux.   In particular, you need to pay careful attention when choosing the R versions that are required and use caution during installation.  The order of operations is important and performing steps out of sequence can adversely affect versions already installed on the machine or require backtracking in the process; please read the following before proceeding.  The GenePattern team has not fully evaluated all of the possible issues and combinations involved.  What follows is our best understanding at present (this document last updated Dec. 9, 2013).
 

The Simplest Path

If you do not plan to be using R outside of GenePattern, then the following steps should cover setting up your Mac for multiple versions of R.  Note that the 'sudo' commands used below will prompt you for your password to grant administrative access.
  1. Install R 2.5.1 using the CRAN installer.
    Download this patch file bundle to modify R 2.5.1 to allow use of other versions of R.  After downloading, execute the following from Terminal:
    cd /Library/Frameworks/R.framework/Versions
    sudo tar -xzvmpf ~/Downloads/R_2.5.1_mac_patch.tar.gz
  2. (Optional) If you will be using the ComBat module, you will also need to install R 2.7.2.  
    Before using the CRAN installer, execute the following from Terminal:
    sudo pkgutil --forget org.r-project.R.framework
    If you do not, the CRAN installer will remove R 2.5.1.  It should give you a message similar to "Forgot package 'org.r-project.R.framework' on '/'."  After that, use the R 2.7.2 CRAN installer as usual.
    Download this patch file bundle to modify R 2.7.2 to allow use of other versions of R.  After downloading, execute the following from Terminal:
    cd /Library/Frameworks/R.framework/Versions
    sudo tar -xzvmpf ~/Downloads/R_2.7.2_mac_patch.tar.gz
    Then set the R2.7_HOME and R2.7 substitution parameters as described above.
  3. If you are planning to use ExpressionFileCreator v11.14+ or another module that requires R 2.15.2, it is recommended that you use the R Installer Plug-in which will automatically install and configure R 2.15.2 for use with GenePattern.  As opportunity permits we will be updating our R 2.15.2 modules to use this plug-in, but if you encounter one which does not then you can use RankNormalize v1.3+ to trigger the installation.  Both of these modules are available from our Beta repository.
For most users, this set of instructions should be sufficient.  If you skip Step 2 but later decide that you need ComBat, it should be fine to go back and do it later provided you first use the pkgutil command.  Users who will be working with R outside of GenePattern will need to be aware of a number of additional details.  This includes users doing R development, running R scripts, or updating the R installations.
 

Further Details

When you install R from CRAN (for example, R 2.7) it becomes the default version of R on your system.  This is true even if you had previously installed a newer version of R such as R 2.15. Executing R from a Terminal command line or the R GUI will run the last installed version.  Unless other steps are taken, this will affect GenePattern as well.
 
By default, the CRAN installer will also modify other installed versions, resulting in these possibly being removed or otherwise adversely affected. This is spelled out in a README displayed within the installer (on the second step) and provides instructions to avoid these effects.  If you want to run multiple versions of R on the Mac, these instructions apply to you.  Before installing an additional version, you will need to execute a "pkgutil --forget" command from the Terminal command line so that the next installer will not touch the existing version of R.  Versions of R are bundled as a DMG file (built for OS X Tiger and later) or as a PKG file (for OS X Leopard and later).  The correct command depends on the type of bundle that you are installing:
Since they are registered differently, installing a Leopard build doesn't affect an existing Tiger build other than taking over as the default version of R.  Likewise, a Tiger install won't affect a Leopard build other than becoming the default.  Note that R 2.10 was transitional, so the correct instructions depend on whether it is installed from the DMG (Tiger) or the PKG (Leopard) file.  The command given above for Tiger builds differs from the instructions given in the installer itself.  Those instructions applied to earlier versions of Mac OS X (e.g. Tiger) while the above command is the equivalent in GenePattern-supported versions of Mac OS X.
 
R 3.0 is the first version that requires OS X Snow Leopard.  It comes as a PKG file, gives the same 'pkgutil' instructions as the Leopard builds and seems to behave the same way.  We have not tested any GenePattern components on R 3.0 and therefore do not recommend it for use with GenePattern at this time due to the issues with compatability and validity covered earlier.
 
This posting on the R-SIG-Mac mailing list indicates that the above instructions assume that you are upgrading R (like going from R 2.12 to R 2.15) and that they may not work for installing earlier versions (e.g. installing 2.12 after 2.15).  We have not fully tested the
various combinations but we have seen problems based on installation order.  We recommend first carefully choosing the versions you need and then installing them in order from oldest to most recent. 
 
If you are running R from outside of GenePattern and need to use a different version, you can use the RSwitch program to change which is current.  For example, after you install R 2.7 you can use RSwitch to switch back to R 2.15.  This is documented in the R for Mac OS X FAQ.  There are some additional instructions if you intend to use RSwitch.  We don't havee specific recommendations on this kind of setup but you can find further details in this post on the R-SIG-Mac mailing list. 
 
The use of RSwitch is not effective for use from GenePattern due to the need to run multiple R versions at the same time. As a workaround, you must edit the installed executable shell scripts.  The patch level bundles described above do this for R 2.5.1 and 2.7.2 and the R Installer plug-in does it for R 2.15.  If you have other versions of R you will need to make these modifications on your own.  In the affected files, replace all hard coded paths to /Library/Frameworks/R.framework/Resources with the actual path to the correct installed version of R as described by this posting from the R-SIG-Mac mailing list.  These are the important scripts to modify (not all are present in every version): 
Use a plain text editor like vi or emacs to make these changes.  It is a good idea to make a backup copy beforehand and to use 'sudo' when editing so that the files retain their original permission and ownership settings.  Be aware that, according to this R-SIG-Mac posting, there may be some issues with this set-up.  If you use R outside of GenePattern, we recommend that you use RSwitch to set your favored version as Current when all these modifications are finished as this seems to be the best way to mitigate those problems. 
 

Adding R Version 2.0.1 to GenePattern

There are some GenePattern modules which rely on R version 2.0.1:

To use these modules on your server you need to add R version 2.0.1.  Note that R 2.0.1 may not be available or may not work properly on newer versions of Windows and Mac OS X.  We are in the process of evaluating how to address this.

To add R2.0.1 to your GenePattern installation:

  1. Install R2.0.1.
  2. In GenePattern, click Administration>Server Settings and go to the Programming Languages page.
  3. Set the R 2.0.1 Home parameter to the full path of the R2.0.1 installation. This defines the <R> variable.
  4. Click Save to update the GenePattern server configuration.

GenePattern can now run modules written for R2.0.1.


Using the R Installer Plug-in

Due to the possibly difficult nature of setting up a GenePattern server to use multiple versions of R, we have created a plug-in to assist in the process.  This plug-in only deals with R 2.15 at present; we may expand to cover other versions of R in the future.  The plug-in installation is triggered by the installation of a module which declares that it requires R 2.15.   At the moment, this is limited to the beta releases of ExpressionFileCreator v11.14+ and RankNormalize v1.3+, both available from our Beta repository.  We will update other modules as the opportunity permits (this document last updated Dec. 9, 2013).
 
The Mac platform is the most tricky in terms of support for multiple versions of R and so that will be the main focus of this guide.  As the story is much simpler on Windows and Linux, those platforms will be covered much more briefly.
 

Use of the R Installer on Mac OS X

Due to the way that it is installed, support for use of multiple versions of R on Mac OS X is tricky and has a number of issues.  It is mainly due to these issues that the GenePattern team created the R Installer Plug-in to simplify the process.  The plug-in will go through several possible scenarios to detect and/or install R in a way that works for GenePattern.  For the most part, you should not need to worry about those details.
 

The Simplest Path

If you already have a version of R installed in the default location on your Mac, the plug-in will be able to set up R 2.15 for use with GenePattern.  This can be any version of R: it could be 2.5.1 or 2.15.2 or any other version.  The reasons for this will be discussed below if you are interested; for most users the reasons are not important.
 
Many users will have already installed R 2.5.1 as it required for several core GenePattern modules (CART, ComparativeMarkerSelection, ConsensusClustering, GEOImporter, NMFConsensus and others).  If you think you will be using these modules, you should go ahead and install this version of R first.  If you do so, then when you later install ExpressionFileCreator the plug-in will automatically download and install R 2.15.2 into the correct location and set it up for use with GenePattern.  
 
Alternatively, if you have already installed R 2.15 (at any patch level) to the default location then the plug-in will detect it and set it up for use with GenePattern.  Note that this makes some minor changes to the installed version of R.  The reasons for this - and its effects - are discussed below; again, for most users the reasons are not important.
 
The bottom line for most users is that if you've already installed either R 2.5.1 or R 2.15 as usual from a CRAN installer, the plug-in will configure things correctly for you.  Some possible complications will be discussed in the next section, but you should not need to worry about them unless you use R outside of the context of GenePattern or if you decide to update your R installation.
 

Possible Complications

There are some important considerations if you work with R outside of GenePattern or if you decide to update your version of R:
  1. On a Mac, installing a newer version of R may result in older versions being removed unless you take specific steps to avoid this.  As a practical matter, the R 2.15 installers will not remove R 2.5.1.
  2. Certain files must be changed in the R installation in order to allow multiple versions to work at the same time.  Those changes will be undone if you update R to a different patch level, so it will be necessary to re-apply those by hand.  The R 2.15 patch level bundles are available from the GenePattern FTP site.  After installing R 2.15.3, for example, you would download R_2.15.3_mac_patch.tar.gz and then open Terminal.app to unpack it.  Execute these commands:
    cd /Library/Frameworks/R.framework/Versions
    sudo tar -xzvmpf ~/Downloads/R_2.15.3_mac_patch.tar.gz
  3. If you switch between multiple versions of R outside of the context of GenePattern, you should modify all of your versions of R in a similar way.  At this time, we are only providing patch level bundles for the versions of R used by GenePattern.  For any other version, you will need to do this on your own.  Please note that the available bundles are version-specific down to the patch level, so don't try to use them with other versions.  Also, the use of a utility like RSwitch may confuse GenePattern unless all versions of R have been so modified.
  4. The plug-in backs up the modified files beforehand, so if you want to revert your R installation back to the original state for any reason (such as uninstalling GenePattern, for example) you can delete the modifications and put the originals back in place.  Note that doing so will very likely affect the behavior of R modules in GenePattern.  The original files are stored alongside the modified files in the same path location but with the ".orig" extension added (e.g. R64.orig). 
  5. If you do not install it yourself, the version of R 2.15.2 delivered by the plug-in is not quite a full installation.  In particular it is missing the R.app and R64.app GUI applications.  Again, this will only those user who want to use R outside of GenePattern.  These users should instead install R 2.15.2 on their own and allow the plug-in to detect it.
 

Using the R Installer on Windows

Support for use of multiple versions of R on a Windows machine is straightforward.  R comes in standard click-through installers available from CRAN.  Due to the way it is installed, each version is completely isolated and it is possible to have different major and minor R releases, and even different patch-levels, on a Windows machine at the same time: R 3.0.1, R 2.15.2, R 2.15.3, R 2.5.1, etc. can all be present with no issue.  This being the case, R installation is left up to the user with the plug-in making only a few final configuration steps.  
 
On a Windows machine, the R Installer Plug-in will attempt to detect R in the standard location.  If it does, it will go ahead and configure GenePattern to use it.  All you need to do is to obtain and install R 2.15.2 using the default settings, and then install a module like ExpressionFileCreator v11.14+ that needs it.
 
If you will be installing R to a different location, you will need to take the extra step of setting a custom property for R2.15_HOME manually on your GenePattern server.  This is also required if you decide to run a different patch level of R such as 2.15.3.
 

Using the R Installer on Linux

Support for use of multiple versions of R on Linux is good, though it requires platform-specific steps by the system administrator.  Setting up the installations of R varies by Linux distribution; we can't give general instructions due to the large number of distributions available.  You are advised to look to the specific instructions for your platform at either CRAN or your distribution's support site (or both).
 
Note that obtaining R through a package management system like apt-get may result in an installation that will be auto-updated to a different version in the future.  This can lead to compatibility problems in running your modules and affect reproducibility of your past results.  To avoid this you can instead install R from the archived source bundles to keep multiple versions available.  Please refer to CRAN and your distribution's support site for more information.
 
After obtaining and installing the required version of R, you will need to set a custom property  for R2.15_HOME manually on your GenePattern server before installing modules which require it.  When the R Installer Plug-in runs, it will check whether this property has been set and verify that it points to an installation of R with the correct major and minor version.
 

Possible Plug-in Errors

Substitution Variable

Depending on your version of GenePattern, you may receive an error message similar to:
     no substitution available R2.15_HOME in command line R2.15_HOME/bin/Rscript
Certain versions of GenePattern are not able to load all required settings in one pass but may succeed on a second attempt.  Try re-installing the module or pipeline as that may resolve the issue.  If this error persists, the information in this Admin Guide section and the plug-in output may help you resolve the issue.  There are additional details in the Using Different Versions of R section of the Admin Guide as well.
 

Installation location error (Mac only)

On a Mac, if you have never installed any version of R you will probably get a message regarding the /Library/Frameworks/R.framework/Versions directory not being present.  In Mac OS X Lion (10.7) and above, creating this location requires administrator privileges and can't be done within GenePattern.  This is why the plug-in will work if you've already installed some version of R: that previous install will have created the correct location for you.  Without that location, the installation attempt will very likely fail and require some manual steps.  There are two options.  Either choice is fine; you only need to do one or ther other:
  • Install R 2.15 manually in the default location so that the plug-in can detect it.  This is available from CRAN; see here for locations.
  • Execute the following commands from a Terminal window:
    sudo mkdir -p /Library/Frameworks/R.framework/Versions
    sudo chgrp -R admin /Library/Frameworks/R.framework/
    sudo chmod -R g+w /Library/Frameworks/R.framework/
Administrative access will be required in either case, and afterward it will be necessary to retry the plug-in installation.
 

Version Check

The R Installer Plug-in will check the configured version of R to make sure it is compatible.  There are several problems or conditions that it can detect at this point:
  • The installed R version has a different patch level than expected.  This is just a warning and will not stop the plug-in installation.
  • The installed R version has a different major or minor number than expected.  This is considered an error; the most likely cause is that you have set up R2.15_HOME to use a version of R other than 2.15.
  • The installed version of R can't execute, results in an error, or returns no output.  You may need to re-install R 2.15, but first check to make sure that R2.15_HOME contains no typos.

 


Increasing Memory Allocation

GenePattern allocates memory to the server, to the "client" (the computer you are using to access GenePattern), and to individual modules. When a module fails with an out of memory error, you can try increasing the amount of memory allocated to the server, the client, or the module.

Increasing Memory for Modules

To increase the amount of memory allocated to a module written in Java or R, click Administration>Server Settings. The Programming Languages page (Programming Language Options) provides several options for increasing Java and R memory options.

You can customize memory preferences on a per-module basis. This is useful when some of your modules require more memory than others. If you haven't already done so, copy the 'config_default.yaml' file as 'config_custom.yaml'. Then set 'config.file=config_custom.yaml' in the genepattern.properties file. These files are in the resources directory of your installation. You need to restart your server for this to take effect. This ensures that your custom configuration will not be inadvertently modified when you install an updated version of GenePattern.

Set custom memory settings in the config_custom.yaml file. This is a text file in YAML format. The 'job.memory' property defines the memory requirements for your module, for example job.memory: 512m, job.memory: 2g. This defines the '-Xmx' flag passed to the java command line. It also defines the memory requirements passed along to the queuing system such as LSF or SGE. In the rare case when you want to use a different java memory flag from the queuing system flag, set both 'job.memory' and 'job.javaXmx'.

You can also customize error handling for completed jobs. When the 'job.error_status.stderr' flag is set to 'true' the server will interpret a non-empty stderr stream as a failed job. When the 'job.error_status.exit_value' flag is set to 'true' the server will interpret a non-zero exit code as a failed job. 

# example config_yaml entry
module.properties:
    # custom memory flags for the ConvertLineEndings module
    ConvertLineEndings:
        # single parameter sets both the java flag and the queuing system memory requirements
        job.memory: 2Gb

    #
    # advanced flags
    MyModule:
        # it is possible to pass one value to the java command line, e.g. java -Xmx1g
        job.javaXmx: 1g
        # and a different value to the queuing system
        job.memory: 2g
        #
        # ignore stderr output
        job.error_status.stderr: false
        # don't ignore the exit code
        job.error_status.exit_value: true
 
A module author can define job preferences in the manifest file for the module. The config_yaml file takes precedence over declarations in the module manifest. The parameters are the same, but don't forget to use the '=' sign instead of the ':' sign.
job.memory=2g
job.error_status.stderr=false
job.error_status.exit_value=true

Increasing Memory for Visualizers

Many GenePattern modules are run on the server. However, visualizers are applications that run on your computer, rather than on the GenePattern server. This means that you must have Java installed on your computer. For easy debugging, set your Java preferences so that the Java console displays.

The default visualizer memory limit is 512 MB. However, if you find that your visualizer repeatedly runs out of memory, you can try a few things to eliminate that error:

Increasing Memory for the Server and/or Client

To increase the amount of memory allocated to the server and/or the client, follow the instructions for your platform:

Mac OS X

  1. Right-click on the file GenePattern/Tomcat/StartGenePatternServer (server) or the GenePatternClient/GenePattern Client (client).
  2. Select Show Package Contents from the pop-up menu. The Contents directory should open in the finder.
  3. In the Contents directory, double-click the Info.plist file. This should open the Property List Editor program.
  4. Add the child VMOptions under the Java node.
  5. Change the Class of the added VMOptions node to ‘Array’.
  6. Add the child with Class 'String' with the value -Xmx512M. You can replace the value 512 with the maximum amount of memory in MB that you want the GenePattern Client to use.

Windows and Linux

  1. Edit the configuration file GenePatternServer/StartGenePatternServer.lax (server) or GenePatternClient/GenePattern Client.lax (client).
  2. In either file, look for the entries noted below and increase these values (for example, double the value) up to the maximum memory size of the machine you are using. (Note: Windows limits the total space available to a process to 2 GB. Some of that is used for overhead, so slightly less is really available to the JRE.)

Using a Queuing System

Queuing systems such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE) allow computational resources to be used effectively. If you have installed a queuing system, you can configure the GenePattern server to use it. On a heavily used server, using a queuing system to execute analysis jobs generally improves performance overall, especially for compute-intensive and long-running jobs; however, short jobs might take slightly longer because they must be dispatched to the queuing system.

The GenePattern server includes support for Sun Grid Engine (SGE) and LSF. To configure your server, you need to edit the configuration file and restart the server. Detailed documentation is in the 'config_example.yaml' file which is in the resources directory of your local GenePattern installation.

There are three additional ways to configure GenePattern's interaction with your queuing system; either programmatically or with a command line prefix.

JobRunner API

To integrate your queuing system with GenePattern:

  1. Implement the JobRunner API
  2. Deploy the jar file to the GenePattern server.
  3. Configure your server.
  4. Reload the new configuration.

1. Implement the JobRunner API

A source code snippet from the API is included here. Contact us for the full source code.

interface JobRunner {

    /**
     * The GenePattern Server calls this when it is ready to submit the job to the queue.
     * Submit the job to the queue and return immediately.
     * The drm jobId returned by this method is used as the key into a 
     * lookup table mapping the gp jobId to the drm jobId.
     * 
     * @return the drm jobId resulting from adding the job to the queue.
     */
    String startJob(DrmJobSubmission drmJobSubmission) throws CommandExecutorException;


    /**
     * Get the status of the job.
     * @param drmJobId
     * @return
     */
    DrmJobStatus getStatus(DrmJobRecord drmJobRecord);


    /**
     * This method is called when the GP server wants to cancel a job before it 
     * has completed on the queuing system. 
     * For example when a user terminates a job from the web ui.
     * 
     * @param drmJobRecord, contains a record of the job
     * @return true if the job was successfully cancelled, false otherwise.
     * @throws Exception
     */
    boolean cancelJob(DrmJobRecord drmJobRecord) throws Exception;
}
 
The required Java libraries come with your local install of GenePattern and can be found in the <GenePatternServer>/Tomcat/webapps/gp/WEB-INF/lib directory.
Follow steps 2-4 in the CommandExecutor Interface section below to configure your server.

CommandExecutor Interface

To use a queuing system with GenePattern:

  1. Implement the CommandExecutor Java API.
  2. Deploy the resulting jar file to the GenePattern server.
  3. Configure your server.
  4. Reload the new configuration.

Each step is described in detail below.

1. Implement the Command Executor Interface

The full source for the Command Executor API is included here:

/**
 * Interface for managing job execution via runtime exec or an external queuing system. This interface is responsible for both initialization and shutdown of external services,
 * as well as the management of job submission, getting job status, and killing, pausing, and resuming jobs.
 *
 * @author pcarr
 */
public interface CommandExecutor {
   //configuration support
   /**
    * [optionally] set a path to a configuration file.
    */
   void setConfigurationFilename(String filename);
 
   /**
    * [optionally] provide properties.
    * @param properties
    */
   void setConfigurationProperties(CommandProperties properties);
 
   /**
    * Start the service, typically called at application startup.
    */
   public void start();
 
   /**
    * Stop the service, typically called just before application shutdown.
    */
   public void stop();
 
   /**
    * Request the service to run a GenePattern job. It is up to the service to monitor for job completion and callback to GenePattern when the job is completed.
    *
    * @see GenePatternAnalysisTask#handleJobCompletion(int, String, String, int)
    *
    * @param commandLine
    * @param environmentVariables
    * @param runDir
    * @param stdoutFile
    * @param stderrFile
    * @param jobInfo
    * @param stdin
    *
    * @throws CommandExecutorException when errors occur attempting to submit the job
    */
   void runCommand(
           String commandLine[],
           Map<String, String> environmentVariables,
           File runDir,
           File stdoutFile,
           File stderrFile,
           JobInfo jobInfo,
           File stdinFile)
   throws CommandExecutorException;
 
   /**
    * Request the service to terminate a GenePattern job which is running via this service.
    * @param jobInfo
    * @throws Exception indicating that the job was not properly terminated.
    */
   void terminateJob(JobInfo jobInfo) throws Exception;
 
   /**
    * This method is called on server startup for each RUNNING job for this queue.
    *
    * For RuntimeExec, tell the GP server to delete the job results directory and requeue the job.
    * For other executors, (such as LSF), you may want to ignore this message.
    * For PipelineExec, you may need to determine the last successfully completed step before resuming the pipeline.
    *
    * @return an optional int flag to update the JOB_STATUS_ID in the GP database, ignore if it is less than zero
    */
   int handleRunningJob(JobInfo jobInfo) throws Exception;
}

The required Java libraries come with your local install of GenePattern and can be found in the <GenePatternServer>/Tomcat/webapps/gp/WEB-INF/lib directory.

The interface accepts requests to start and terminate jobs from the server. You will need to invoke a callback to the GP server when your job has completed.

Example snippet:

try {
     GenePatternAnalysisTask.handleJobCompletion(jobInfo.getJobNumber(), exitCode, null, runDir, stdoutFile, stderrFile);
 }
 catch (Exception e) {
     log.error("Error handling job completion for job "+jobInfo.getJobNumber(), e);
 }

Once you have implemented this interface, create a jar file to deploy to the GP server.

2. Deploy the jar File to the GenePattern Server

The jar file and all of the dependent libraries must be installed to <GenePatternServer>/Tomcat/webapps/gp/WEB-INF/lib.

3. Configure Your Server

To configure your server to interact with your queuing system, you must edit the config.yaml file. In a fresh install of GenePattern, there will be two .yaml files found in <GenePatternServer>/resources: config_default.yaml and config_example.yaml. It is highly recommended that you make a copy of config_default.yaml and name it something like config.yaml. This will give you a working copy of your configuration file, preserving the default and example versions for your future reference. Additionally this will prevent your working copy from getting overwritten during server upgrade.

Edit the config.file property in the <GenePatternServer>/resources/genepattern.properties file to point to your new configuration file. By default, the property looks like this:

config.file=config_default.yaml

For this example, you would edit the property as follows:

config.file=config.yaml

Now, edit your working copy of the configuration file, config.yaml. (The following code snippets come from the config_example.yaml file.)

a) Define an executor in the "executors" section. To do so add an item to the list of 'executors' in the yaml document.

# a list of command executors
# The executor id, 'org.genepattern.server.executor.PipelineExecutor', is reserved for the default executor which runs all GP pipelines.
# Don't use this as an executor id in this file.
# a map of <id>:<obj>, where
#    obj := <classname> | <map>
#    classname := fully qualified classname of a class which implements the org.genepattern.server.executor.CommandExecutor interface
#    map := classname=<classname> [configuration.file: <path_to_config_file> | configuration.properties: <map>] [default.properties: <map>]
executors:
    # default executor for all jobs, it is included in GenePattern
    RuntimeExec:
        classname: org.genepattern.server.executor.RuntimeCommandExecutor
        configuration.properties:
            # the total number of jobs to run concurrently
            num.threads: 20
            # the total number of jobs to keep on the queue, not yet implemented
            #max.pending.jobs: 20000
 
    # nested declaration with configuration file, <id>: { classname: <classname>, configuration: <config_file> }
    Test:
        classname: org.genepattern.server.executor.TestCommandExecutor
        configuration.properties:
            num.threads: 20

b) Configure your server to use your executor.

# apply these properties to all jobs
default.properties:
    executor: Test
    java_flags: -Xmx512m

c) Optionally, you can use the configuration file to override the default executor on a per module, group or user basis. The following example comes from per module section, more examples can be found in config_example.yaml.

# override default.properties and executor->default.properties based on taskname or lsid
# Note: executor->configuration.properties are intended to be applied at startup and are not overwritten here
module.properties:
    CBS:
        executor: LSF
        lsf.max.memory: 16
        java_flags: -Xmx16g

About the .yaml configuration file: As of GenePattern 3.4.0, you use the .yaml configuration file only to configure GenePattern for use with a queuing system. As you work with the .yaml file, you may notice that it contains several properties that are also defined in the genepattern.properties file. To avoid confusion, leave them set to agree with the genepattern.properties file. GenePattern 3.4.0 reads these properties from the genepattern.properties file, not from the .yaml file. (In a future release, the genepattern.properties file may define the default server settings and the .yaml configuration file may define custom server settings.)

4. Reload the Configuration

At this point, you have deployed your command executor, modified the .yaml configuration file to control its use, and modified the <GenePatternServer>/resources/genepattern.properties file to point to the modified .yaml configuration file. Now, stop and restart the GenePattern server to reload the server configuration and begin to use the new command executor.

As you use GenePattern with the queuing system, you may find it useful to modify the configuration. The Administration>Server Settings>Job Configuration page provides several useful tools for controlling the internal GenePattern job queue and reloading the .yaml configuration file. Use this page to confirm which command  executors are currently installed and the exact .yaml configuration file currently in use. If you make minor adjustments to the configuration file, such as overriding the command executor used for a module, group or user, you can use the Job Configuration page to reload the configuration file without restarting the GenePattern server. On the other hand, for major changes, such as adding a new command executor, we recommend restarting the server rather than simply reloading the configuration.

Command Line Prefix

Pros and Cons

Before the 3.2.3 release of GenePattern (June 2010), the only way to connect to an external queuing system was to use the command line prefix. Although this option requires no Java programming and allows for configuration via a web page, it has significant drawbacks:

The drawbacks are a result of how the command line prefix works. Each new job requires a dedicated server process which waits for the job to complete. When a user terminates a job, the server process is terminated but the external process launched on the queuing system is not terminated. Similarly, when the GenePattern server shuts down, all server processes halt but the processes running on the external queuing system become orphaned. When the GenePattern server restarts, the jobs are not restarted; the user must restart any unfinished job from the beginning.

If you are using the CommandExecutor Interface, we recommend that you not use the command line prefix. The command line prefix is appended to the module command line before the job is executed by the CommandExecutor. To be more precise:

  1. The server gets the initial command line from the manifest.
  2. The command line prefix is appended to the initial command line.
  3. The server resolves all system and user variables and breaks the command line string into tokens.
  4. The server calls the CommandExecutor.runCommand.

Using the Command Line Prefix

Although this is not the preferred method, you can still use the Command Line Prefix to connect to an external queuing system.

To use the Command Line Prefix to configure the GenePattern server to execute jobs using LSF or SGE:

  1. Add the GenePatternURL property to the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, specifying the URL of your server. For example:

    GenePatternURL=http://myserver.company.com:8080/gp/

    When you run a pipeline, the GenePattern server uses this URL to construct the links to the output files.

    By default, the GenePatternURL property is not set. When you run a pipeline, the GenePattern server derives the URL at run time based on the current IP address of the host server. This is ideal for a user running on a laptop, where the IP address may change at startup. However, if you are using a queuing system, the derived URL is incorrect: it is based on the IP address of the queuing system server rather than the GenePattern server.

  2. For Sun Grid Engine modify the R2.5 property in the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, to quote the <r_flags> options. For example:

    R2.5=<java> -DR_suppress\=<R.suppress.messages.file> -DR_HOME\=<R2.5_HOME>
    -Dr_flags\=\"<r_flags>\" -cp <run_r_path> RunR

    Modify other similar properties (if any) that were added to support additional versions of R.

  3. Click Administration>Server Settings and use the Command Line Prefix page to have the GenePattern server add the required options to the command line each time it executes a module.

For example, if you are using LSF, modify the Command Line Prefix options as follows:

  1. Click Administration>Server Settings and select Command Line Prefix. GenePattern displays the Command Line Prefix page.
  2. Enter the following text in the Default Command Prefix field:
    bsub -K -o lsf_log.txt
  3. Optionally, set the environment variables BSUB_QUIET and BSUB_QUIET2 to prevent bsub from printing common job messages to standard out:

Another alternative is to create a script that sets the environment variables and then executes the job using LSF or SGE. The command prefix would then execute the script. For example:

  1. Create the shell script to set the variables and execute the job using LSF. The script executes in the jobResults directory for the job; for example, for job  3248, the script executes in the GenePattern /Tomcat/webapps/gp/jobResults/3248/ directory. The following script sets the environment variables, submits the job to the LSF queue, waits for the job to complete, saves stdout to a new  file, stdout.txt, and saves stderr to a new file, stderr.txt. By convention, GenePattern considers a job to fail if there is any output to stderr.

    #!/bin/bash
    #
    # Submit the job to LSF
    # Save lsf out and err files in the jobResults directory.
    # If there is stdout from the job, pipe to stdout of this script.
    # If there is stderr from the job, pipe to stderr of this script.
    lsf_err=.lsf.err;
    cmd_out=cmd.out;
    BSUB_QUIET=
    BSUB_QUIET2=
    export BSUB_QUIET
    export BSUB_QUIET2
    # submit the job and wait (-K) for the job to complete
    bsub -q genepattern -K -o .lsf_%J.out -e $lsf_err $"$@" \>$cmd_out
    # sleep to allow for NFS delay
    sleep 2;
    # If there is stdout from the job, pipe to stdout of this script, then delete the output file
    if [ -e $cmd_out ]
    then
       cat $cmd_out >&1;
       rm $cmd_out;
    fi
    # If there is stderr from the job, pipe to stderr of this script then delete stderr file
    if [ -e $lsf_err ]
    then
       cat $lsf_err >&2;
       rm $lsf_err;
    fi
  2. Click Administration>Server Settings and select Command Line Prefix. GenePattern displays the Command Line Prefix page.
  3. Enter the following text in the Default Command Prefix field:
    /fully/qualified/path/to/lsf_default.sh
  4. The script shown here saves the lsf log file into the job results directory. In GenePattern, the log files are displayed with the other job result files. If you do not want the log files displayed in GenePattern, edit the /resources/genepattern.properties file and set the following property:
    jobs.FilenameFilter=.lsf*

Securing the Server

Secure the GenePattern server to control who has access to which operations. Since GenePattern is primarily a web application (including SOAP interfaces) running on a web server, general approaches for securing web servers are applicable to the GenePattern server. In addition, GenePattern provides several security features that can easily be used by non-technical users to control access to the server.

This section describes several ways to secure the GenePattern server:

Access Filtering

Use the Access page to define which GenePattern clients have access to the GenePattern server. This is the simplest way to secure your GenePattern server.

Access filtering prevents users from connecting to the GenePattern server unless they come from a known computer. If your computer cannot access the server, you cannot access the server regardless of your username/password or permissions. The localhost (127.0.0.1) computer cannot be denied access to the locally installed GenePattern server. This prevents you from inadvertently denying yourself access to the server.

To use access filtering (as described in Modifying Server Settings):

  1. Click Administration>Server Settings.
  2. Use the Access page to determine which clients have access to your GenePattern server:

    Access

Password Protection

By default, the GenePattern server requires only a user name to authenticate a GenePattern user. You can easily add password protection by modifying the GenePattern server properties.

To add password protection, modify the GenePattern server properties:

  1. Edit the GenePattern configuration file, GenePatternServer/resources/genepattern.properties.
  2. Set the requirePassword property to true: requirePassword=true.
  3. Save the genepattern.properties file.
  4. Restart the GenePattern server.

When you add password protection to the server:

Assigning passwords to existing user accounts prevents anyone from inadvertently or intentionally logging into and taking control of another user’s account. After adding password protection to the server, set passwords for existing users as follows:

  1. Select Administration>Server Settings>Users and Groups to list all users registered on the server.
  2. Sign into GenePattern using the name of an existing user.
  3. When GenePattern prompts you to set a password, select a password for that user.
  4. After setting the password, GenePattern displays the Change Email page (My Settings). Set the user’s email address if it has not been set. This is the address GenePattern uses to send the user a new password if necessary.
  5. Sign out and repeat the process for the next user.
  6. After setting passwords for all users, let them know that passwords have been set. You do not need to send the users their passwords. Simply ask users to sign into GenePattern and click the Forgot your password? link to have GenePattern send a temporary password.

User Accounts

By default, users create their own accounts by clicking the Registration link on the GenePattern login page. To configure GenePattern to allow only administrators to create new accounts:

  1. Shut down the server.
  2. Edit the file GenePatternServer/Tomcat/webapps/gp/WEB-INF/web.xml. Remove registerUser.jsf from the no.login.required.redirect.to.home parameter value. After the edits, it looks like this:
    <init-param>
    <!-- List of jsf pages that user can access if not logged in. If user requests one of these pages while logged in, he is redirected to the home page. -->
    <param-name>no.login.required.redirect.to.home</param-name>
    <param-value>login.jsf,forgotPassword.jsf</param-value>
    </init-param>
    Result: A user cannot access the registration page until she has successfully logged into the server.
  3. Edit the file GenePatternServer/resources/actionPermissionMap.xml. Add the following line to the <actionPermissionMap>:
    <url link="registerUser.jsf" permission="adminServer"/>
    Result: A user must be an administrator to access the registration page.
  4. Edit the file GenePatternServer/Tomcat/webapps/gp/pages/login.xhtml. Replace the phrase
    rendered="#{loginBean.createAccountAllowed and loginBean.showRegistrationLink}">
    with
    rendered="false">
    Result: Removes the Click to register link from the login page.
  5. Restart the server.

To create an account:

  1. Login using an administrator account. GenePattern displays the home page.
  2. Open the user registration page, registerUser.jsf. For example, if the URL for the GenePattern home page is
    http://127.0.0.1:8080/gp/pages/index.jsf
    change the URL to
    http://127.0.0.1:8080/gp/pages/registerUser.jsf
  3. Create the new user account. You are automatically logged in as that new user.
  4. To create another new account, logout of the new user account and login using your administrator account.

User Permissions

User permissions determine valid actions for the user. Permissions are based on two configuration files in the GenePatternServer/resources directory (the links show the default files):

A user who belongs to multiple groups is given the most permissive permissions granted to those groups. For example, an administrator who belongs to other groups retains administrator permissions.

To assign or modify user permissions, edit the permissionMap.xml file. The XML syntax is simple but must be followed carefully. The rules are as follows:

By default:

Note: No explicit permission is required to run public modules/pipelines, or private modules/pipelines that you have created. No explicit permission is required to edit or delete your own modules, pipelines, suites, or jobs.

createModule

Permits creation of a module. Creation refers to any action that adds a module to the server, including create, install from repository, install from zip, and clone.

createPrivatePipeline

Permits creation of a private pipeline (a pipeline visible only to its creator). Creation refers to any action that adds a private pipeline to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a pipeline, you must have createModule permission.

createPrivateSuite

Permits creation of a private suite (a suite visible only to its creator). Creation refers to any action that adds a private suite to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a suite, you must have createModule permission.

createPublicPipeline

Permits creation of a public pipeline. Creation refers to any action that adds a public pipeline to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a pipeline, you must have createModule permission.

createPublicSuite

Permits creation of a public suite. Creation refers to any action that adds a public suite to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a suite, you must have createModule permission.

adminJobs

Permits viewing and deleting jobs and associated files owned by other users. Users with this permission can delete any job on the server. Typically, only members of the Administrators group are given this permission.

adminModules

Permits viewing and deleting private modules owned by other users. Permits deleting public modules. Note: No explicit permission is required to view public modules.

adminPipelines

Permits viewing and deleting private pipelines owned by other users. Permits deleting public pipelines. Note: No explicit permission is required to view public pipelines.

adminSuites

Permits viewing and deleting private suites owned by other users. Permits deleting public suites. Note: No explicit permission is required to view public suites.

adminServer

Permits access to Administration>Server Settings and all actions on the Server Settings page, including modifying server settings and shutting down the server. Users with this permission are considered to be GenePattern administrators. On the Users and Groups page, a checkmark in the admin? column indicates that a user has this permission. Typically, only members of the Administrators group are given this permission.

User Authentication and Authorization

You can configure the GenePattern server to provide password protection, restrict creation of user accounts, and assign permissions based on groups. Additional or alternative authentication and authorization mechanisms can be added to the server by an administrator with programming experience. The remainder of this section is written for such a programmer. Note: The links in this section display the source code for the default GenePattern installation, which should be used as the starting point for any modifications.

Authentication

The authentication filter, AuthenticationFilter.java, controls whether a user can log into the server (typically based on username and password). The easiest way to modify GenePattern authentication is by implementing the IAuthenticationPlugin.java interface:

  1. Implement the IAuthenticationPlugin interface. Use the IAuthenticationPlugin.java file as the starting point. Comments in the file provide the specification. For example, create a MyCustomGenePatternAuthentication.java interface.
  2. Compile the interface and add it to the classpath for the GenePattern server web application.
  3. Modify the authentication.class property in the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, to point to the new interface. For example:
    authentication.class=org.genepattern.server.auth.MyCustomGenePatternAuthentication
  4. Restart the GenePattern server for the changes to take effect.

See ftp://ftp.broadinstitute.org/pub/genepattern/src/gp-custom-auth.zip for an example project that prepares a custom authentication jar file for deployment to your local GenePattern server.

If the IAuthenticationPlugin interface methods do not provide enough flexibility, you can modify the authentication filter.

Authorization

The authorization filter, AuthorizationFilter.java, controls which GenePattern operations (web pages) the user can access. As described in User Permissions, permissions are based on two configuration files: userGroups.xml, which defines user groups, and permissionMap.xml, which defines which groups have access to which permissions.

Organizations that have user groups defined in an external system can use those groups rather than using the userGroups.xml. To have the authorization filter use external user groups rather than the userGroups.xml file, implement the IGroupMembershipPlugin.java interface:

  1. Implement the IGroupMembershipPlugin interface. Use the IGroupMembershipPlugin.java file as the starting point. Comments in the file provide the specification. For example, create a MyCustomGroupMembershipPlugin.java interface.
  2. Compile the interface and add it to the classpath for the GenePattern server web application.
  3. Modify the group.membership.class property in the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, to point to the new interface. For example:
    group.membership.class=org.genepattern.server.auth.MyCustomGroupMembershipPlugin
  4. Restart the GenePattern server for the changes to take effect.

To assign permissions to a group authorized through the IGroupMembershipPlugin interface, include the group in the permissionMap.xml file. If the IGroupMembershipPlugin interface methods do not provide enough flexibility, you can modify the authorization filter.

Modifying the filters

The authentication and authorization filters are servlet filters installed in front of the GenePattern web application in the GenePatternServer/Tomcat/webapps/gp/WEB-INF/web.xml file. To implement an alternative authentication (or authorization) filter:

  1. Write and compile a new ServletFilter that that performs the desired authentication (or authorization).
  2. Place the jar file containing the new ServletFilter into the following directory:
    */GenePatternServer/Tomcat/webapps/gp/WEB-INF/lib
  3. Modify the GenePattern server’s web.xml document.
    */GenePatternServer/Tomcat/webapps/gp/WEB-INF/web.xml
    Note: It is important to maintain the existing order of the servlet filters in the web.xml document as they are used in the order they are defined in the document. The Authentication filter must come before the Authorization filter for the Authorization filter to work.
  4. Change the definition of the AuthenticationFilter (or AuthorizationFilter) to use the class that you have provided.
  5. Add any necessary configuration elements that it requires.
  6. Restart the GenePattern server for the changes to take effect.

Note: If you look at the code for the default Authentication Filter (AuthenticationFilter.java), you will see that it allows requests through that have a parameter called jsp_precompile that have come from the localhost. If you do not allow these requests through unauthenticated, you will see a series of errors when you start the GenePattern server as it attempts to precompile the JSP pages. These are not fatal errors, but they slow down server response for users the first time that pages are accessed following a server restart.

Secure Sockets Layer (SSL) Support

This section describes how you can modify the GenePattern web application to run on a web server that is configured to use the HTTPS protocol, where essentially the regular http requests are routed through a secure sockets layer (SSL) making them much harder for hackers to access. If you have installed your GenePattern server onto a web server other than the default Tomcat instance it is distributed with, configure your web server according to its instructions and then follow Step 2 below.

Note: When running under SSL, programming language clients and the GenePattern web client may not be able to connect to your GenePattern server.

Step 1. Configure Tomcat for SSL support

Follow the instructions available at http://tomcat.apache.org/tomcat-5.5-doc/ssl-howto.html to configure the Tomcat instance for using SSL. In doing so, you will modify the Tomcat configuration file, which is located in the GenePatternServer/Tomcat/conf directory.

Step 2.Configure GenePattern for SSL

Once the Tomcat (or other web server) has been configured for SSL, modify the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, to ensure that its properties are in synch with the web server:

  1. Add a new key, java.net.ssl.trustStore=<path to keystore>.
    This should point to the keystore you created when configuring Tomcat (above) or some keystore that GenePattern can use to establish SSL connections.
  2. Modify the value for the key GENEPATTERN_PORT to use the https port you selected when configuring Tomcat (above).
  3. Modify the value for the key GenePatternURL to use the https protocol and the https port you selected, for example:
    http://localhost:8080/gp becomes https://localhost:8443/gp

Save the genepattern.properties file and restart your server. Any bookmarked links to your GenePattern server must be updated to the new protocol and port.


Changing the GenePattern Database (HSQL to Oracle)

The GenePattern server runs against a database. By default, the GenePattern installation sets up an HSQL database. This section describes how to build and use an Oracle database in place of the HSQL database.

When using Oracle (or another database) you must initialize the database by running the scripts in the <GenePattern_HOME>/resources directory. You or your database administrator must ensure the database is available by JDBC URL from the GenePattern server.

Note: This procedure has been tested using the default Tomcat 5.5 server (Tomcat documentation), which comes with GenePattern.


Integrating GenePattern with Existing Tools

This section provides guidance to system administrators interested in integrating GenePattern into the analysis tools at their site. It highlights issues that might arise and how to address them, and provides links to relevant portions of the GenePattern documentation, supplementing that documentation as needed.

Typographical conventions:

Tables like this describe implementation on the GenePattern public server.

Installing GenePattern

The standard installation procedure uses Install Anywhere to install the server on Windows, Mac, or Linux using a Tomcat web server. To install on a different web server or on another platform, use the WAR file installer. Instructions for both the standard installation and the WAR file installation are on the download page: http://www.genepattern.org/download/.

Hardware and software requirements for GenePattern are described in the Release Notes.

GenePattern Database

The GenePattern server runs against a database. The GenePattern installation creates an HSQL database. For instructions on how to build and use an Oracle database instead, see Changing the GenePattern Database (HSQL to Oracle).

We use an Oracle database for the GenePattern public server.

Securing the GenePattern Server

The following sections briefly summarize how to secure your GenePattern server, including access to the server from client machines, GenePattern user accounts, authentication (e.g., username & password) and authorization (e.g., permissions). For more detail, see Securing the Server.

Access

By default, any client machine can access a GenePattern server. Optionally, you can configure your GenePattern server to restrict access to selected domains. See Securing the Server.Access Filtering.

Access to the GenePattern public server is not restricted.

User Accounts

A user must have a GenePattern account to log into the GenePattern server. By default, when a user first logs into the server, GenePattern automatically create an account for that username.

To enable registration, in the genepattern.properties file, set require.password=true. This setting adds a registration link (and password prompt) to the GenePattern login page. The first time users log into GenePattern, they must click the registration link to create an account. User account information is stored in the GenePattern Database.

Alternatively, configure the GenePattern server to not allow users to create GenePattern accounts (create.account.allowed=false). In this case, new user accounts must be explicitly created by editing the GenePattern database.

See Securing the Server.Password Protection.

Registration (and passwords) are enabled on the GenePattern public server

Authentication

Each GenePattern user must register to access the GenePattern server. By default, GenePattern requires only a username for authentication. Optionally, you can configure the GenePattern server to require both a username and a password for authentication. See Securing the Server.Password Protection.

GenePattern user authentication is performed by a servlet filter installed in front of the GenePattern web application in its web.xml file. To provide additional or alternative authentication, implement an IAuthenticationPlugin.java interface or modify the servlet filter. See Securing the Server.User Authentication and Authorization.

The GenePattern public server hosted at the Broad Institute uses the username and password authentication provided by the GenePattern installation.

Collaborator

A large university uses Kerberos to provide username and password authentication for their network. They wrote their own servlet filter to have the GenePattern server also authenticate using Kerberos.

Permissions

GenePattern permissions are based on two configuration files:

GenePattern user authorization is performed by a servlet filter installed in front of the GenePattern web application in its web.xml file. By default, users are assigned permissions based on GenePattern groups. To have the authorization filter use external user groups rather than the userGroups.xml file, implement the IGroupMembershipPlugin.java interface. To provide additional or alternative authorization, modify the servlet filter. See Securing the Server.User Authentication and Authorization.

On the GenePattern public server, the following permissions are restricted to a small number of users in the Administrator group:

  • createModule - we restrict this to prevent malicious code on the server
  • createPublicPipeline - we restrict this to prevent proliferation of untested pipelines
  • adminJobs, AdminModules, adminPipelines, adminSuites - we restrict these to preserve privacy
  • adminServer - we restrict this to secure the server

SSL

For information on how to modify the GenePattern web application to run on a web server that is configured to use the HTTPS protocol, see Securing the Server.Secure Sockets Layer (SSL) Support.

The GenePattern public server is not running under SSL.

Other Security Considerations

We take the following additional steps to secure the machine running the GenePattern public server (these steps may not be necessary on less public servers):

Modules

This section discusses how to install, create, and manage modules.

Installing Modules

By default, you install modules, pipelines, and suites from the Broad repository. The module repository contains more than 100 modules and pipelines. Suites are stored in a separate suite repository. For instructions on how to install modules from the repository, see Managing Modules, Pipelines, and Suites.

The repository is updated regularly. We recommend checking for new modules on a weekly basis.

Create your own repository: Optionally, you can select an alternate repository from which to install modules, pipelines, and/or suites. See Repositories.

At the Broad, we maintain a development repository for modules in development and a production repository for released modules. Only the production repository is available from the GenePattern public server.

Creating Modules

For instructions on how to create modules, as well as a step-by-step tutorial for creating a module, see the Programmers Guide.

Running Modules in a Cluster

Queuing systems such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE) allow computational resources to be used effectively. If you have such a queuing system, you typically want the GenePattern server to use it. For instructions on how to configure the GenePattern server to use a queuing system, see Using a Queuing System.

As described in the instructions, you click Administration>Server Settings and use the Command Line Prefix page to define the command prefix that runs the module on the cluster. The instructions use the Default Command Prefix field of the Command Line Prefix page to define one command prefix for all modules, which sends all modules to one queue. You can use that same page to define unique command line prefixes for specific modules. This allows you to send different modules to different queues, which helps to address hardware and memory issues. For example, certain modules (such as SNPFileCreator or HierarchicalClustering) require significant amounts of RAM.

The script described in the instructions writes the LSF log file into the job results directory. To prevent GenePattern from displaying the LSF log files with the rest of the job results, edit the genepattern.properties file and set jobs.FilenameFilter=.lsf*.

The GenePattern public server uses two queues: one for most modules and one for modules that require large amounts of memory. Modules sent to the 'bigmem' queue are run on a cluster of large memory machines. LSF log files are hidden.

Managing Memory for Modules

In GenePattern, you manage memory for modules in one of two ways:

Modules that Require Extra Memory

The following modules frequently require additional memory:

On the GenePattern public server, these modules are sent to a cluster of large memory machines.

genepattern.properties

Most server configuration options are in the genepattern.properties file in the GenePattern resources directory. Most of the options in this file can be set through the GenePattern interface by clicking Administration>Server Settings. For descriptions of the options, see Modifying Server Settings.

The options listed in the following table can only be set by editing the genepattern.properties settings. We recommend editing the properties through the GenePattern user interface when possible.

require.password

create.account.allowed

See User Accounts

GenePatternURL

fqHostName

fullyQualifiedHostName

gpServerHostAddress

See the FAQ: How do I configure the GenePattern server on a machine with multiple IP addresses?

allow.input.file.paths=true

server.browse.file.system.root=/

See Other Security Considerations

input.file.mode=path

Determines how GenePattern handles network file paths:

  • path (default) leaves the network file in place
  • move copies the network file to the job directory on the GenePattern server before beginning the job and copies the file back to its original location after the job completes

soap.attachment.dir=../temp/attachments

Used for the GenePattern SOAP interface. Specify a temporary directory to be used for SOAP messages with attachments.

Web Service Interface

All GenePattern server functionality is available programmatically. There are two basic access methods:


Setting Up a New GenePattern AMI

The following steps are necessary to create a new GenePattern instance from the GenePattern Amazon Machine Image (AMI).

Create your Amazon account (if you do not already have one)

  1. Set up an Amazon Web Services account.
    1. Go to http://aws.amazon.com and click Sign Up Now. Sign in to an existing Amazon account or create a new one.
    2. Go to http://aws.amazon.com/ec2 and click Sign Up Now. After completing sign up, you should be at the EC2 console.
    3. In the Navigation menu, click Key Pairs.
    4. Click the Create Key Pair button and then save the key pair to a file. Name it something you will remember, like "GenePattern."  This is private key that is used in creating SSH connections to new AMI instances.
  2. Set up your Amazon API credentials.
    1. Select Account>Security Credentials.
    2. Click the X.509 Certificates tab.
    3. Click to Create a New Certificate. Note: An Amazon account may have two certificates.  If your account already has two certificates, you will need to use one of them, or delete one and create a new one.
    4. Download the private key and certificate, and save them in a known location, for example. ~/.ec2/prikey.pem and ~/.ec2/cert.pem.
    5. Make sure these files have the correct security, otherwise using them for SSH will not work: chmod 600 ~/.ec2/*.pem

Create a new instance of the GenePattern AMI

  1. Go to the AMI creation link: https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-0d2b3364
  2. If you are not already logged in, log into Amazon Web Services.
  3. If Amazon prompts you for a region, select us-east-1a, as the GenePattern AMI has only been made available for that region.
  4. Follow the steps in the wizard.
  5. The new instance will appear in your list of instances in the AWS Management Console.
  6. SSH into the new GenePattern AMI instance using username and password "genepattern".
  7. Edit /home/genepattern/GenePatternServer/resources/genepattern.properties by changing GenePatternURL to point to the instance URL.

Mounting EBS storage for use with GenePattern (optional)

Note: If one opts not to set up EBS storage, files will be saved on the GenePattern instance's file system.  This file system is sufficient only for a small number of GenePattern files.

  1. Create the EBS
    1. Go to http://aws.amazon.com/ec2 and sign in.
    2. Select Volumes>Create Volume.
    3. When the dialog box appears, enter the size of the EBS volume you want to create. We recommend at least 100 GB for GenePattern.  If you are asked for a snapshot, select No Snapshot.  This EBS has to be in the same availability zone as the GenePattern instance (us-east-1a).
    4. Note the ID of the EBS once created. You will need this shortly.
  2. Attach the EBS to the GenePattern Instance
    1. Under Volumes, select the EBS volume that you have just created.
    2. Click Attach Volume and select your GenePattern instance from the list.
    3. Click Attach and wait several minutes for the volume to finish attaching.
    4. Note the device name of the attached EBS.
  3. Mount the EBS in the GenePattern Instance
    1. SSH into your GenePattern instance using username and password "genepattern".
    2. Confirm that the EBS has finished attaching by running "cat /proc/partitions" and making sure the attached name appears in the list.
    3. If you need to format the EBS, run "sudo /sbin/fdisk ATTACHED" where "ATTACHED" is the attached name you were shown for the EBS, for example "/dev/sdk". Follow the prompts. This should give you a partition name, for example "/dev/sdk1". Make note of this.
    4. If you need to partition the EBS (you will need to partition the EBS unless you are using a pre-existing EBS), run "sudo /sbin/mkfs -t ext3 PARTITION" where "PARTITION" is the partition name you were just given.
    5. To mount the partition, run "sudo mount PARTITION /home/genepattern/GenePatternServer/data" where "PARTITION" is the partition name you were just given.

Start your GenePattern AMI

  1. You can start GenePattern by running: /home/genepattern/GenePatternServer/StartGenePatternServer
  2. You can view your GenePattern instance by navigating to http://ADDRESS:8080/gp where ADDRESS is the domain name in use or the Amazon Public DNS address.
  3. GenePattern log files can be found at: /home/genepattern/GenePatternServer/Tomcat/logs/