You can use the GenePattern public server hosted at the Broad Institute, install a local GenePattern server for your own use, or install a networked GenePattern server to be used by several people. Concepts explains the benefits of each approach.
GenePattern can be run standalone on a small machine or separated into its client and server components to take advantage of a more powerful compute server. When you install a GenePattern server, you set basic server configuration options. If you are installing a local GenePattern server for your own use, you generally do not need to modify the server configuration. If you are the server administrator for a networked GenePattern server, you generally want to modify several of the GenePattern configuration options described in this guide.
Note: Only the GenePattern team can create groups on the GenePattern public server. To create a group, you must have installed a local GenePattern server (see Starting Your Own GenePattern Server).
The GenePattern configuration file GenePatternServer/resources/userGroups.xml defines groups and group membership. The Users and Groups server settings page lists all registered users and the groups to which they belong.
To create or modify groups, edit the userGroups.xml file. The XML syntax is simple but must be followed carefully. The rules are as follows:
<group> element to create a group. You can create any number of groups. The group names must be unique. They should include only alphanumeric characters, periods (.), and underscores (_).<user name> element to add members to a group. You can add any number of users to a group. A user may be in any number of groups. Setting user name = “*” adds all users to a group.
As shown below, the default userGroups.xml file defines one group, administrators, which includes all GenePattern users. Members of the administrators group have full access to the GenePattern server and all jobs run on the server. Because all users are administrators, the default GenePattern installation has no concept of “private” data.
|
<!-- map of users to groups --> |
To maximize data privacy, minimize the number of users in the administrators group. For example, add exactly one person to the administrators group and only that one administrator can view all jobs run on the server. Other users can view their own jobs and jobs that have been explicitly shared.
|
<!-- map of users to groups --> |
To create a new group, add a <group> element to the userGroups.xml file. The following edited userGroups.xml file adds a second user to the administrators group and creates a new group, mjones_lab:
|
<!-- map of users to groups --> |
Renaming a group does not update shared analysis results. Members of a group can share analysis results. If you rename a group, from old_name to new_name for example, the users in the old_name group are now in the new_name group. Analysis results that they shared however were shared with the old_name group. Each user who shared job results with the old_name group should edit the share options for the job and share the job results with the new_name group.
To modify the configuration of your GenePattern server, use the Server Settings page:
The following table summarizes the server settings. For more detail, click a link in the table.
|
Specify which clients have access to the server. |
|
|
Specify software source directories and other low-level configuration options. |
|
|
Specify commands and qualifiers to be prepended to the command line used to invoke a module or pipeline. |
|
|
Create new server configuration options. |
|
|
Specify configuration options for the GenePattern database. |
|
|
Specify how long files remain on the server before being deleted. |
|
|
Display the log file for the GenePattern server. |
|
|
Work with a job queue that you have configured for use with a queuing system, such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE). |
|
|
Specify the root directories for the programming languages used by GenePattern and the Java flags to be added to Java command lines executed by the server. |
|
|
If your organization has a web proxy between the GenePattern server and the internet, specify the proxy information required to access the internet. |
|
|
Specify the URL used to access the module repository and the suite repository. |
|
|
Shut down the GenePattern server. |
|
|
Broadcast a message to all users logged into the GenePattern server. |
|
|
Display the LSID of each module and pipeline installed on the GenePattern server. |
|
|
Display the account information and uploaded files of a selected user. |
|
|
Display account information for all users, including the groups to which they belong. |
|
|
Display the log file for the web server used by the GenePattern server. |
Use the Access page to define which GenePattern clients have access to the GenePattern server. The localhost (127.0.0.1) computer cannot be denied access to the locally installed GenePattern server. This prevents you from inadvertently denying yourself access to the server.
Using the Access page to control which computers have access to the GenePattern server is the simplest way to secure your server. You can also control access to your server based on user authentication and user permissions, as described in Securing the Server. The Access page filters are applied before any user-specific authentication or permissions are checked. If your computer cannot access the server, you cannot access the server regardless of your username/password or permissions.

Click Save to save your changes. Click Restore to return to the value set at installation.
The Advanced page contains directory specifications for the GenePattern source files and other low-level configuration options. You rarely need to modify these options.

Click Save to save your changes. Click Restore to return to the values set at installation.
The Command Line Prefix page allows you to prepend text to the command line used to execute a module. You can prepend the same text to all module command lines or prepend text for a specific module.
Note: Prior to GenePattern 3.2.3 (June 2010), administrators used the command line prefix for connecting to an external queuing system. GenePattern now provides the CommandExecutor interface for that purpose. For more information, see Using a Queuing System.

To prepend text to all (or most) command lines executed by the GenePattern server:
To prepend text only to command lines that invoke specific modules or pipelines:
Use the Custom page to define your own configuration options.
When you create a module, the custom configuration options are available as substitution variables in the module command line. For example, if you define a custom property "foo", you can use <foo> in the command line to pass the value of the custom configuration option to your module. In the Broad repository, for example, the LandmarkMatching and PeakMatch modules use the custom configuration option pepperPrefix. For more information, see Creating Modules in GenePattern.

Use the Database Parameters page to set configuration options for the GenePattern database. The following figure shows the HSQL options. You rarely need to change these options.
For information about modifying the database, see Changing the GenePattern Database (HSQL to Oracle).

Click Save to save your changes. Click Restore to return to the value set at installation.
Use the File Purge page to specify when analysis result files are deleted from the server:

Click Save to save your changes. Click Restore to return to the values set at installation.
Use the GenePattern Log page to view warnings and messages generated by the GenePattern server. (Use the Web Server Log page to view messages generated by the web server that GenePattern uses.)

If you have configured your GenePattern system to work with a queuing system, such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE), the Job Configuration page helps you control the queue and reload your configuration files. For more information, see Using a Queuing System.
Use the Job Configuration section to control the GenePattern internal job queue:
Use the Command Executors section to identify each of the command executors currently installed on the GenePattern server.
Use the Configuration File section to identify and review the .yaml configuration file currently active on the GenePattern server.

The Programming Languages page contains two sections. After making changes, click Save to save them or Restore to return to the value set at installation.
Use Programming Language Configurations to specify the root directories for the programming languages used by GenePattern:

When you install GenePattern, you install the programming languages used by GenePattern. If you have alternate programming language installations that you prefer to use, use this page to point to those installations. If you would like to use more recent versions of R, see Using Different Versions of R.
Use Programming Language Options to increase the memory allocated to modules written in Java and R:

You can also increase the amount of memory allocated to the GenePattern server or client. For more information, see Increasing Memory Allocation.
If your server is behind a firewall, use the Proxy page to set the HTTP and FTP Proxy information. Without the proxy information, the server cannot download modules, pipelines, or suites from the repository maintained by the Broad Institute. If you do not know the proxy information, contact your systems administrator.

Click Save to save your changes. Click Restore to return to the values set at installation.
Use the Repositories page to identify the location of the repository to be accessed by the GenePattern server when you install modules and pipelines or suites from the repository. By default, it points to the module repository maintained by the Broad Institute. For information about implementing a module repository at your site, see the In-Depth Article Setting Up a Module Repository.

Click Save to save your changes. Click Restore to return to the values set at installation. Click Remove to delete the selected URL from the list.
You can shut down the GenePattern server by clicking the link on this page. Alternatively, double-click the Stop GenePattern Server icon on your desktop.

Use the System Message page to broadcast a message to all users logged into the GenePattern server. The message text that you enter can include simple HTML formatting commands, such as <b> and <em>.

The Task Info page lists every module and pipeline installed on the GenePattern server. It can be useful in sorting out the confusion that can occur when modules and pipelines share the same name.
The Clear TaskInfo Cache link clears an internal GenePattern cache, which can be useful for GenePattern development. Clicking the link has no visible impact on GenePattern operations.

The Uploaded Files page displays basic information about a user and their uploaded files (see Uploading Files). By default, the page displays information about the user logged into the GenePattern web client. To view information for another user, enter their username and click Select User.
If a user manually adds or removes files from the Uploads directory on the file server:
This allows you to synchronize the user interface with the modified uploads directory on the file server without restarting the GenePattern server.

Use the Users and Groups page to view user account information, including the groups to which a user belongs. This page shows only registered users. An administrator can add users to a group (Creating Groups and Administrators) before they register, but the users are not listed on this page until they have created a GenePattern account by clicking the Registration link on the GenePattern login page. If you update the userGroups.xml file, click Reload Users and Groups to update (resynchronize) the GenePattern web interface. This allows you to update users and groups without restarting the GenePattern server.
When you start the GenePattern server, the server populates the Uploads tab for each user by reviewing the Uploads directory on the file server. Typically, users add and remove uploaded files from the GenePattern web client interface. If a user adds or removes files from the Uploads directory on the file server, enter their username and click Resync Uploads to update (resynchronize) their Uploads tab based on their Uploads directory on the file server. This allows you to synchronize the user interface with the modified uploads directory on the file server without restarting the GenePattern server.

Use the Web Server Log page to view messages generated by the web server that GenePattern uses. (Use the GenePattern Log page to view warnings and messages generated by the GenePattern server.)

As of Release 3.2.1, the GenePattern server can be configured to run under Java 5 or Java 6.
When installed on Mac OS X 10.6 (Snow Leopard), the GenePattern server is automatically configured for Java 6.
When installed on Mac OS X 10.5 (Leopard), the GenePattern server is configured for Java 5 by default.
To configure the GenePattern server for Java 6:
When installed on Windows, the GenePattern server is configured for Java 5 by default.
To configure the GenePattern server for Java 6:
# LAX.NL.CURRENT.VM
# -----------------
# the VM to use for the next launch
lax.nl.current.vm=\jre\bin\java.exe
When installed on Linux, the GenePattern server is configured for Java 5 by default.
To configure the GenePattern server for Java 6:
# LAX.NL.CURRENT.VM
# -----------------
# the VM to use for the next launch
lax.nl.current.vm=/tools/pkgs/jdk_1.6.0_12/bin/java
Installing GenePattern (version 3.1 and later) installs R 2.5.
/Library/Frameworks/R.framework/Versions/2.5.The GenePattern modules available in the Broad Institute repository (Modules & Pipelines>Install from repository) all work with R 2.5. However, some GenePattern modules require different versions of R; for example, ExpressionFileCreator v.10 requires R 2.8. Unfortunately, R is not backward compatible. If you simply install and run the latest version of R, modules may fail or (worse) may produce invalid results even though they do not fail.
In GenePattern, each module definition includes a command line that runs the analysis program. For an R module, the R version is defined by a command line substitution parameter. For example, the <R> parameter is substituted with the full path to the R 2.0.1 executable. The <R2.5> parameter is substituted with the full path to the R 2.5 executable. Similar parameters are used for other versions of R.
GenePattern version 3.1 and later installs R 2.5 and sets the <R2.5> parameter. If you upgraded from GenePattern 3.0, your GenePattern installation also includes R 2.0.1 and sets the <R> parameter.
To add a different version of R to your GenePattern installation (for example R 2.8 on Mac OS X, for ExpressionFileCreator v. 10):
R2.8_HOME=/Library/Frameworks/R.framework/Versions/2.8/Resources |
Extra steps are required when you install R on Mac OS X from a binary package downloaded from CRAN. When you install R in this manner, you are not able to use more than one version of R at a time. You can only use the most recently installed version of R, even if this happens to be an earlier version of R. You can change the active version of R using the RSwitch GUI (http://r.research.att.com/), but it is not effective for use from GenePattern. As a workaround, you must manually edit the installed executable shell scripts. Replace all hard coded paths to /Library/Frameworks/R.framework/Resources with the actual path to the correct installed version of R. The file r_
There are some GenePattern modules which rely on R version 2.0.1:
To use these modules on your server you need to add R version 2.0.1.
To add R2.0.1 to your GenePattern installation:
GenePattern can now run modules written for R2.0.1.
GenePattern allocates memory to the server, to the "client" (the computer you are using to access GenePattern), and to individual modules. When a module fails with an out of memory error, you can try increasing the amount of memory allocated to the server, the client, or the module.
To increase the amount of memory allocated to a module written in Java or R, click Administration>Server Settings. The Programming Languages page (Programming Language Options) provides several options for increasing Java and R memory options.
Many GenePattern modules are run on the server. However, visualizers are applications that run on your computer, rather than on the GenePattern server. This means that you must have Java installed on your computer. For easy debugging, set your Java preferences so that the Java console displays.
The default visualizer memory limit is 512 MB. However, if you find that your visualizer repeatedly runs out of memory, you can try a few things to eliminate that error:
To increase the amount of memory allocated to the server and/or the client, follow the instructions for your platform:
GenePattern/Tomcat/StartGenePatternServer (server) or the GenePatternClient/GenePattern Client (client).Info.plist file. This should open the Property List Editor program.VMOptions under the Java node.VMOptions node to ‘Array’.-Xmx512M. You can replace the value 512 with the maximum amount of memory in MB that you want the GenePattern Client to use.GenePatternServer/StartGenePatternServer.lax (server) or GenePatternClient/GenePattern Client.lax (client).lax.nl.java.option.java.heap.size.initiallax.nl.java.option.java.heap.size.max Queuing systems such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE) allow computational resources to be used effectively. If you have installed a queuing system, you can configure the GenePattern server to use it. On a heavily used server, using a queuing system to execute analysis jobs generally improves performance overall, especially for compute-intensive and long-running jobs; however, short jobs might take slightly longer because they must be dispatched to the queuing system.
There are two ways to configure GenePattern's interaction with your queuing system; either programmatically through the CommandExecutor Java API or through a command line prefix.
To use a queuing system with GenePattern:
Each step is described in detail below.
The full source for the Command Executor API is included here:
/** * Interface for managing job execution via runtime exec or an external queuing system. This interface is responsible for both initialization and shutdown of external services, * as well as the management of job submission, getting job status, and killing, pausing, and resuming jobs. * * @author pcarr */public interface CommandExecutor { //configuration support /** * [optionally] set a path to a configuration file. */ void setConfigurationFilename(String filename); /** * [optionally] provide properties. * @param properties */ void setConfigurationProperties(CommandProperties properties); /** * Start the service, typically called at application startup. */ public void start(); /** * Stop the service, typically called just before application shutdown. */ public void stop(); /** * Request the service to run a GenePattern job. It is up to the service to monitor for job completion and callback to GenePattern when the job is completed. * * @see GenePatternAnalysisTask#handleJobCompletion(int, String, String, int) * * @param commandLine * @param environmentVariables * @param runDir * @param stdoutFile * @param stderrFile * @param jobInfo * @param stdin * * @throws CommandExecutorException when errors occur attempting to submit the job */ void runCommand( String commandLine[], Map<String, String> environmentVariables, File runDir, File stdoutFile, File stderrFile, JobInfo jobInfo, File stdinFile) throws CommandExecutorException; /** * Request the service to terminate a GenePattern job which is running via this service. * @param jobInfo * @throws Exception indicating that the job was not properly terminated. */ void terminateJob(JobInfo jobInfo) throws Exception; /** * This method is called on server startup for each RUNNING job for this queue. * * For RuntimeExec, tell the GP server to delete the job results directory and requeue the job. * For other executors, (such as LSF), you may want to ignore this message. * For PipelineExec, you may need to determine the last successfully completed step before resuming the pipeline. * * @return an optional int flag to update the JOB_STATUS_ID in the GP database, ignore if it is less than zero */ int handleRunningJob(JobInfo jobInfo) throws Exception;} |
The required Java libraries come with your local install of GenePattern and can be found in the <GenePatternServer>/Tomcat/webapps/gp/WEB-INF/lib directory.
The interface accepts requests to start and terminate jobs from the server. You will need to invoke a callback to the GP server when your job has completed.
Example snippet:
try { GenePatternAnalysisTask.handleJobCompletion(jobInfo.getJobNumber(), exitCode, null, runDir, stdoutFile, stderrFile); } catch (Exception e) { log.error("Error handling job completion for job "+jobInfo.getJobNumber(), e); } |
Once you have implemented this interface, create a jar file to deploy to the GP server.
The jar file and all of the dependent libraries must be installed to <GenePatternServer>/Tomcat/webapps/gp/WEB-INF/lib.
To configure your server to interact with your queuing system, you must edit the config.yaml file. In a fresh install of GenePattern, there will be two .yaml files found in <GenePatternServer>/resources: config_default.yaml and config_example.yaml. It is highly recommended that you make a copy of config_default.yaml and name it something like config.yaml. This will give you a working copy of your configuration file, preserving the default and example versions for your future reference. Additionally this will prevent your working copy from getting overwritten during server upgrade.
Edit the config.file property in the <GenePatternServer>/resources/genepattern.properties file to point to your new configuration file. By default, the property looks like this:
config.file=config_default.yaml |
For this example, you would edit the property as follows:
config.file=config.yaml |
Now, edit your working copy of the configuration file, config.yaml. (The following code snippets come from the config_example.yaml file.)
a) Define an executor in the "executors" section. To do so add an item to the list of 'executors' in the yaml document.
# a list of command executors# The executor id, 'org.genepattern.server.executor.PipelineExecutor', is reserved for the default executor which runs all GP pipelines.# Don't use this as an executor id in this file.# a map of <id>:<obj>, where# obj := <classname> | <map># classname := fully qualified classname of a class which implements the org.genepattern.server.executor.CommandExecutor interface# map := classname=<classname> [configuration.file: <path_to_config_file> | configuration.properties: <map>] [default.properties: <map>]executors: # default executor for all jobs, it is included in GenePattern RuntimeExec: classname: org.genepattern.server.executor.RuntimeCommandExecutor configuration.properties: # the total number of jobs to run concurrently num.threads: 20 # the total number of jobs to keep on the queue, not yet implemented #max.pending.jobs: 20000 # nested declaration with configuration file, <id>: { classname: <classname>, configuration: <config_file> } Test: classname: org.genepattern.server.executor.TestCommandExecutor configuration.properties: num.threads: 20 |
b) Configure your server to use your executor.
# apply these properties to all jobsdefault.properties: executor: Test java_flags: -Xmx512m |
c) Optionally, you can use the configuration file to override the default executor on a per module, group or user basis. The following example comes from per module section, more examples can be found in config_example.yaml.
# override default.properties and executor->default.properties based on taskname or lsid# Note: executor->configuration.properties are intended to be applied at startup and are not overwritten heremodule.properties: CBS: executor: LSF lsf.max.memory: 16 java_flags: -Xmx16g |
About the .yaml configuration file: As of GenePattern 3.4.0, you use the .yaml configuration file only to configure GenePattern for use with a queuing system. As you work with the .yaml file, you may notice that it contains several properties that are also defined in the genepattern.properties file. To avoid confusion, leave them set to agree with the genepattern.properties file. GenePattern 3.4.0 reads these properties from the genepattern.properties file, not from the .yaml file. (In a future release, the genepattern.properties file may define the default server settings and the .yaml configuration file may define custom server settings.)
At this point, you have deployed your command executor, modified the .yaml configuration file to control its use, and modified the <GenePatternServer>/resources/genepattern.properties file to point to the modified .yaml configuration file. Now, stop and restart the GenePattern server to reload the server configuration and begin to use the new command executor.
As you use GenePattern with the queuing system, you may find it useful to modify the configuration. The Administration>Server Settings>Job Configuration page provides several useful tools for controlling the internal GenePattern job queue and reloading the .yaml configuration file. Use this page to confirm which command executors are currently installed and the exact .yaml configuration file currently in use. If you make minor adjustments to the configuration file, such as overriding the command executor used for a module, group or user, you can use the Job Configuration page to reload the configuration file without restarting the GenePattern server. On the other hand, for major changes, such as adding a new command executor, we recommend restarting the server rather than simply reloading the configuration.

Before the 3.2.3 release of GenePattern (June 2010), the only way to connect to an external queuing system was to use the command line prefix. Although this option requires no Java programming and allows for configuration via a web page, it has significant drawbacks:
The drawbacks are a result of how the command line prefix works. Each new job requires a dedicated server process which waits for the job to complete. When a user terminates a job, the server process is terminated but the external process launched on the queuing system is not terminated. Similarly, when the GenePattern server shuts down, all server processes halt but the processes running on the external queuing system become orphaned. When the GenePattern server restarts, the jobs are not restarted; the user must restart any unfinished job from the beginning.
If you are using the CommandExecutor Interface, we recommend that you not use the command line prefix. The command line prefix is appended to the module command line before the job is executed by the CommandExecutor. To be more precise:
Although this is not the preferred method, you can still use the Command Line Prefix to connect to an external queuing system.
To use the Command Line Prefix to configure the GenePattern server to execute jobs using LSF or SGE:
GenePatternServer/resources/genepattern.properties, specifying the URL of your server. For example:
GenePatternURL=http://myserver.company.com:8080/gp/
When you run a pipeline, the GenePattern server uses this URL to construct the links to the output files.
By default, the GenePatternURL property is not set. When you run a pipeline, the GenePattern server derives the URL at run time based on the current IP address of the host server. This is ideal for a user running on a laptop, where the IP address may change at startup. However, if you are using a queuing system, the derived URL is incorrect: it is based on the IP address of the queuing system server rather than the GenePattern server.
GenePatternServer/resources/genepattern.properties, to quote the <r_flags> options. For example:
R2.5=<java> -DR_suppress\=<R.suppress.messages.file> -DR_HOME\=<R2.5_HOME>
-Dr_flags\=\"<r_flags>\" -cp <run_r_path> RunR
Modify other similar properties (if any) that were added to support additional versions of R.
For example, if you are using LSF, modify the Command Line Prefix options as follows:
bsub -K -o lsf_log.txt
Another alternative is to create a script that sets the environment variables and then executes the job using LSF or SGE. The command prefix would then execute the script. For example:
#!/bin/bash |
/fully/qualified/path/to/lsf_default.shjobs.FilenameFilter=.lsf*Secure the GenePattern server to control who has access to which operations. Since GenePattern is primarily a web application (including SOAP interfaces) running on a web server, general approaches for securing web servers are applicable to the GenePattern server. In addition, GenePattern provides several security features that can easily be used by non-technical users to control access to the server.
This section describes several ways to secure the GenePattern server:
Use the Access page to define which GenePattern clients have access to the GenePattern server. This is the simplest way to secure your GenePattern server.
Access filtering prevents users from connecting to the GenePattern server unless they come from a known computer. If your computer cannot access the server, you cannot access the server regardless of your username/password or permissions. The localhost (127.0.0.1) computer cannot be denied access to the locally installed GenePattern server. This prevents you from inadvertently denying yourself access to the server.
To use access filtering (as described in Modifying Server Settings):

broadinstitute.org,dfci.harvard.edu,mit.edu.By default, the GenePattern server requires only a user name to authenticate a GenePattern user. You can easily add password protection by modifying the GenePattern server properties.
To add password protection, modify the GenePattern server properties:
GenePatternServer/resources/genepattern.properties.genepattern.properties file.When you add password protection to the server:
Assigning passwords to existing user accounts prevents anyone from inadvertently or intentionally logging into and taking control of another user’s account. After adding password protection to the server, set passwords for existing users as follows:
By default, users create their own accounts by clicking the Registration link on the GenePattern login page. To configure GenePattern to allow only administrators to create new accounts:
GenePatternServer/Tomcat/webapps/gp/WEB-INF/web.xml. Remove registerUser.jsf from the no.login.required.redirect.to.home parameter value. After the edits, it looks like this:|
<init-param> <!-- List of jsf pages that user can access if not logged in. If user requests one of these pages while logged in, he is redirected to the home page. --> <param-name>no.login.required.redirect.to.home</param-name> <param-value>login.jsf,forgotPassword.jsf</param-value> </init-param> |
GenePatternServer/resources/actionPermissionMap.xml. Add the following line to the <actionPermissionMap>:| <url link="registerUser.jsf" permission="adminServer"/> |
GenePatternServer/Tomcat/webapps/gp/pages/login.xhtml. Replace the phrase
rendered="#{loginBean.createAccountAllowed and loginBean.showRegistrationLink}"> |
rendered="false"> |
To create an account:
http://127.0.0.1:8080/gp/pages/index.jsf |
http://127.0.0.1:8080/gp/pages/registerUser.jsf |
User permissions determine valid actions for the user. Permissions are based on two configuration files in the GenePatternServer/resources directory (the links show the default files):
A user who belongs to multiple groups is given the most permissive permissions granted to those groups. For example, an administrator who belongs to other groups retains administrator permissions.
To assign or modify user permissions, edit the permissionMap.xml file. The XML syntax is simple but must be followed carefully. The rules are as follows:
<group> element to that permission. A <permission> element may have any number of <group> elements. A <group> element may be listed under any number of <permission> elements.<group name="*"/>.The presence of a group named * means that all groups (and therefore all users) have that permission.<permission> elements. GenePattern uses them to define the permissions that it requires and implements. The permissions are described in the following table.By default:
|
Note: No explicit permission is required to run public modules/pipelines, or private modules/pipelines that you have created. No explicit permission is required to edit or delete your own modules, pipelines, suites, or jobs. |
|
|
createModule |
Permits creation of a module. Creation refers to any action that adds a module to the server, including create, install from repository, install from zip, and clone. |
|
createPrivatePipeline |
Permits creation of a private pipeline (a pipeline visible only to its creator). Creation refers to any action that adds a private pipeline to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a pipeline, you must have createModule permission. |
|
createPrivateSuite |
Permits creation of a private suite (a suite visible only to its creator). Creation refers to any action that adds a private suite to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a suite, you must have createModule permission. |
|
createPublicPipeline |
Permits creation of a public pipeline. Creation refers to any action that adds a public pipeline to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a pipeline, you must have createModule permission. |
|
createPublicSuite |
Permits creation of a public suite. Creation refers to any action that adds a public suite to the server, including create, install from repository, install from zip, and clone. Note: To install the modules in a suite, you must have createModule permission. |
|
adminJobs |
Permits viewing and deleting jobs and associated files owned by other users. Users with this permission can delete any job on the server. Typically, only members of the Administrators group are given this permission. |
|
adminModules |
Permits viewing and deleting private modules owned by other users. Permits deleting public modules. Note: No explicit permission is required to view public modules. |
|
adminPipelines |
Permits viewing and deleting private pipelines owned by other users. Permits deleting public pipelines. Note: No explicit permission is required to view public pipelines. |
|
adminSuites |
Permits viewing and deleting private suites owned by other users. Permits deleting public suites. Note: No explicit permission is required to view public suites. |
|
adminServer |
Permits access to Administration>Server Settings and all actions on the Server Settings page, including modifying server settings and shutting down the server. Users with this permission are considered to be GenePattern administrators. On the Users and Groups page, a checkmark in the admin? column indicates that a user has this permission. Typically, only members of the Administrators group are given this permission. |
You can configure the GenePattern server to provide password protection, restrict creation of user accounts, and assign permissions based on groups. Additional or alternative authentication and authorization mechanisms can be added to the server by an administrator with programming experience. The remainder of this section is written for such a programmer. Note: The links in this section display the source code for the default GenePattern installation, which should be used as the starting point for any modifications.
The authentication filter, AuthenticationFilter.java, controls whether a user can log into the server (typically based on username and password). The easiest way to modify GenePattern authentication is by implementing the IAuthenticationPlugin.java interface:
IAuthenticationPlugin interface. Use the IAuthenticationPlugin.java file as the starting point. Comments in the file provide the specification. For example, create a MyCustomGenePatternAuthentication.java interface.authentication.class property in the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, to point to the new interface. For example:authentication.class=org.genepattern.server.auth.MyCustomGenePatternAuthenticationSee ftp://ftp.broadinstitute.org/pub/genepattern/src/gp-custom-auth.zip for an example project that prepares a custom authentication jar file for deployment to your local GenePattern server.
If the IAuthenticationPlugin interface methods do not provide enough flexibility, you can modify the authentication filter.
The authorization filter, AuthorizationFilter.java, controls which GenePattern operations (web pages) the user can access. As described in User Permissions, permissions are based on two configuration files: userGroups.xml, which defines user groups, and permissionMap.xml, which defines which groups have access to which permissions.
Organizations that have user groups defined in an external system can use those groups rather than using the userGroups.xml. To have the authorization filter use external user groups rather than the userGroups.xml file, implement the IGroupMembershipPlugin.java interface:
IGroupMembershipPlugin interface. Use the IGroupMembershipPlugin.java file as the starting point. Comments in the file provide the specification. For example, create a MyCustomGroupMembershipPlugin.java interface.group.membership.class property in the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, to point to the new interface. For example:group.membership.class=org.genepattern.server.auth.MyCustomGroupMembershipPlugin
To assign permissions to a group authorized through the IGroupMembershipPlugin interface, include the group in the permissionMap.xml file. If the IGroupMembershipPlugin interface methods do not provide enough flexibility, you can modify the authorization filter.
The authentication and authorization filters are servlet filters installed in front of the GenePattern web application in the GenePatternServer/Tomcat/webapps/gp/WEB-INF/web.xml file. To implement an alternative authentication (or authorization) filter:
ServletFilter that that performs the desired authentication (or authorization).ServletFilter into the following directory:*/GenePatternServer/Tomcat/webapps/gp/WEB-INF/lib web.xml document.*/GenePatternServer/Tomcat/webapps/gp/WEB-INF/web.xmlweb.xml document as they are used in the order they are defined in the document. The Authentication filter must come before the Authorization filter for the Authorization filter to work.AuthenticationFilter (or AuthorizationFilter) to use the class that you have provided.
Note: If you look at the code for the default Authentication Filter (AuthenticationFilter.java), you will see that it allows requests through that have a parameter called jsp_precompile that have come from the localhost. If you do not allow these requests through unauthenticated, you will see a series of errors when you start the GenePattern server as it attempts to precompile the JSP pages. These are not fatal errors, but they slow down server response for users the first time that pages are accessed following a server restart.
This section describes how you can modify the GenePattern web application to run on a web server that is configured to use the HTTPS protocol, where essentially the regular http requests are routed through a secure sockets layer (SSL) making them much harder for hackers to access. If you have installed your GenePattern server onto a web server other than the default Tomcat instance it is distributed with, configure your web server according to its instructions and then follow Step 2 below.
Note: When running under SSL, programming language clients and the GenePattern web client may not be able to connect to your GenePattern server.
Follow the instructions available at http://tomcat.apache.org/tomcat-5.5-doc/ssl-howto.html to configure the Tomcat instance for using SSL. In doing so, you will modify the Tomcat configuration file, which is located in the GenePatternServer/Tomcat/conf directory.
Once the Tomcat (or other web server) has been configured for SSL, modify the GenePattern configuration file, GenePatternServer/resources/genepattern.properties, to ensure that its properties are in synch with the web server:
java.net.ssl.trustStore=<path to keystore>.http://localhost:8080/gp becomes https://localhost:8443/gp
Save the genepattern.properties file and restart your server. Any bookmarked links to your GenePattern server must be updated to the new protocol and port.
The GenePattern server runs against a database. By default, the GenePattern installation sets up an HSQL database. This section describes how to build and use an Oracle database in place of the HSQL database.
When using Oracle (or another database) you must initialize the database by running the scripts in the <GenePattern_HOME>/resources directory. You or your database administrator must ensure the database is available by JDBC URL from the GenePattern server.
Note: This procedure has been tested using the default Tomcat 5.5 server (Tomcat documentation), which comes with GenePattern.
genepattern.properties file
database.vendor=ORACLE# optionally, set a different hibernate.configuration filehibernate.configuration.file=hibernate.oracle.jndi.cfg.xml |
META-INF/context.xml
Tomcat/webapps/gp/META-INF/context.xml file has several example configurations for connecting the GP server to a database.context.xml file must match the one you use in the hibernate configuration file.hibernate.configuration.file that was set in the genepattern.properties file. This file is loaded relative to the classes directory of your web application. For a default installation the file is here:
Tomcat/WEB-INF/classes/hibernate.cfg.xmlhibernate.connection.datasource property near the top of the file to point to the correct Resource in context.xml.analysis_oracle-3.X.X.sql scripts in order, by version number, up through the installed version of GenePattern.genepattern.log file to verify that the server started correctly and was able to connect to the database.This section provides guidance to system administrators interested in integrating GenePattern into the analysis tools at their site. It highlights issues that might arise and how to address them, and provides links to relevant portions of the GenePattern documentation, supplementing that documentation as needed.
Typographical conventions:
|
|
Tables like this describe implementation on the GenePattern public server. |
The standard installation procedure uses Install Anywhere to install the server on Windows, Mac, or Linux using a Tomcat web server. To install on a different web server or on another platform, use the WAR file installer. Instructions for both the standard installation and the WAR file installation are on the download page: http://www.genepattern.org/download/.
Hardware and software requirements for GenePattern are described in the Release Notes.
The GenePattern server runs against a database. The GenePattern installation creates an HSQL database. For instructions on how to build and use an Oracle database instead, see Changing the GenePattern Database (HSQL to Oracle).
|
|
We use an Oracle database for the GenePattern public server. |
The following sections briefly summarize how to secure your GenePattern server, including access to the server from client machines, GenePattern user accounts, authentication (e.g., username & password) and authorization (e.g., permissions). For more detail, see Securing the Server.
By default, any client machine can access a GenePattern server. Optionally, you can configure your GenePattern server to restrict access to selected domains. See Securing the Server.Access Filtering.
|
|
Access to the GenePattern public server is not restricted. |
A user must have a GenePattern account to log into the GenePattern server. By default, when a user first logs into the server, GenePattern automatically create an account for that username.
To enable registration, in the genepattern.properties file, set require.password=true. This setting adds a registration link (and password prompt) to the GenePattern login page. The first time users log into GenePattern, they must click the registration link to create an account. User account information is stored in the GenePattern Database.
Alternatively, configure the GenePattern server to not allow users to create GenePattern accounts (create.account.allowed=false). In this case, new user accounts must be explicitly created by editing the GenePattern database.
See Securing the Server.Password Protection.
|
|
Registration (and passwords) are enabled on the GenePattern public server |
Each GenePattern user must register to access the GenePattern server. By default, GenePattern requires only a username for authentication. Optionally, you can configure the GenePattern server to require both a username and a password for authentication. See Securing the Server.Password Protection.
GenePattern user authentication is performed by a servlet filter installed in front of the GenePattern web application in its web.xml file. To provide additional or alternative authentication, implement an IAuthenticationPlugin.java interface or modify the servlet filter. See Securing the Server.User Authentication and Authorization.
|
|
The GenePattern public server hosted at the Broad Institute uses the username and password authentication provided by the GenePattern installation. |
|
Collaborator |
A large university uses Kerberos to provide username and password authentication for their network. They wrote their own servlet filter to have the GenePattern server also authenticate using Kerberos. |
GenePattern permissions are based on two configuration files:
userGroups.xml defines user groupspermissionMap.xml defines which user groups have which permissions; the permissions themselves (e.g., CreateModule, adminModules, and so on) are predefined and cannot be added or removed
GenePattern user authorization is performed by a servlet filter installed in front of the GenePattern web application in its web.xml file. By default, users are assigned permissions based on GenePattern groups. To have the authorization filter use external user groups rather than the userGroups.xml file, implement the IGroupMembershipPlugin.java interface. To provide additional or alternative authorization, modify the servlet filter. See Securing the Server.User Authentication and Authorization.
|
|
On the GenePattern public server, the following permissions are restricted to a small number of users in the Administrator group:
|
For information on how to modify the GenePattern web application to run on a web server that is configured to use the HTTPS protocol, see Securing the Server.Secure Sockets Layer (SSL) Support.
|
|
The GenePattern public server is not running under SSL. |
We take the following additional steps to secure the machine running the GenePattern public server (these steps may not be necessary on less public servers):
file:///server/directory/file.gct) as the value for an input file parameter. When allow.input.file.paths=true, you can use the server.browse.file.system.root property to set a root directory where the GenePattern server begins browsing for the specified network file path; to do so, edit genepattern.properties and set allow.server.file.paths=truegenepattern.properties and set jobs.FilenameFilter=.lsf* (further discussion in Running Modules in a Cluster)This section discusses how to install, create, and manage modules.
By default, you install modules, pipelines, and suites from the Broad repository. The module repository contains more than 100 modules and pipelines. Suites are stored in a separate suite repository. For instructions on how to install modules from the repository, see Managing Modules, Pipelines, and Suites.
The repository is updated regularly. We recommend checking for new modules on a weekly basis.
Create your own repository: Optionally, you can select an alternate repository from which to install modules, pipelines, and/or suites. See Repositories.
|
|
At the Broad, we maintain a development repository for modules in development and a production repository for released modules. Only the production repository is available from the GenePattern public server. |
For instructions on how to create modules, as well as a step-by-step tutorial for creating a module, see the Programmers Guide.
Queuing systems such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE) allow computational resources to be used effectively. If you have such a queuing system, you typically want the GenePattern server to use it. For instructions on how to configure the GenePattern server to use a queuing system, see Using a Queuing System.
As described in the instructions, you click Administration>Server Settings and use the Command Line Prefix page to define the command prefix that runs the module on the cluster. The instructions use the Default Command Prefix field of the Command Line Prefix page to define one command prefix for all modules, which sends all modules to one queue. You can use that same page to define unique command line prefixes for specific modules. This allows you to send different modules to different queues, which helps to address hardware and memory issues. For example, certain modules (such as SNPFileCreator or HierarchicalClustering) require significant amounts of RAM.
The script described in the instructions writes the LSF log file into the job results directory. To prevent GenePattern from displaying the LSF log files with the rest of the job results, edit the genepattern.properties file and set jobs.FilenameFilter=.lsf*.
|
|
The GenePattern public server uses two queues: one for most modules and one for modules that require large amounts of memory. Modules sent to the 'bigmem' queue are run on a cluster of large memory machines. LSF log files are hidden. |
In GenePattern, you manage memory for modules in one of two ways:
java_flags.properties, in the GenePattern /resources directory. Each line of the file lists the LSID of a module and the memory setting for that module. To find the LSID of a module, in GenePattern, click the module and then click the Properties link. Following is an example file:|
# Here is an example which allocates extra RAM for some of the modules: urn\:lsid\:broadinstitute.org\:cancer.software.genepattern.module.analysis \:00087=-Xmx2500m urn\:lsid\:broadinstitute.org\:cancer.software.genepattern.module.analysis \:00086=-Xmx10G urn\:lsid\:broadinstitute.org\:cancer.software.genepattern.module.analysis \:00085=-Xmx2500m urn\:lsid\:broadinstitute.org\:cancer.software.genepattern.module.analysis \:00106=-Xmx2500m urn\:lsid\:broadinstitute.org\:cancer.software.genepattern.module.analysis \:00096=-Xmx2500m urn\:lsid\:broadinstitute.org\:cancer.software.genepattern.module.analysis \:00094=-Xmx2500m urn\:lsid\:broadinstitute.org\:cancer.software.genepattern.module.analysis \:00093=-Xmx2500m |
The following modules frequently require additional memory:
|
On the GenePattern public server, these modules are sent to a cluster of large memory machines. |
Most server configuration options are in the genepattern.properties file in the GenePattern resources directory. Most of the options in this file can be set through the GenePattern interface by clicking Administration>Server Settings. For descriptions of the options, see Modifying Server Settings.
The options listed in the following table can only be set by editing the genepattern.properties settings. We recommend editing the properties through the GenePattern user interface when possible.
|
|
See User Accounts |
|
|
See the FAQ: How do I configure the GenePattern server on a machine with multiple IP addresses? |
|
|
|
|
|
Determines how GenePattern handles network file paths:
|
|
|
Used for the GenePattern SOAP interface. Specify a temporary directory to be used for SOAP messages with attachments. |
All GenePattern server functionality is available programmatically. There are two basic access methods:
The following steps are necessary to create a new GenePattern instance from the GenePattern Amazon Machine Image (AMI).
Note: If one opts not to set up EBS storage, files will be saved on the GenePattern instance's file system. This file system is sufficient only for a small number of GenePattern files.