Integrating GenePattern with Existing Tools  Print-icon

This section provides guidance to system administrators interested in integrating GenePattern into the analysis tools at their site. It highlights issues that might arise and how to address them, and provides links to relevant portions of the GenePattern documentation, supplementing that documentation as needed.

Typographical conventions:

Tables like this describe implementation on the GenePattern public server.

Installing GenePattern

The standard installation procedure uses Install Anywhere to install the server on Windows, Mac, or Linux using a Tomcat web server. To install on a different web server or on another platform, use the WAR file installer. Instructions for both the standard installation and the WAR file installation are on the download page: http://www.genepattern.org/download/.

Hardware and software requirements for GenePattern are described in the Release Notes.

GenePattern Database

The GenePattern server runs against a database. The GenePattern installation creates an HSQL database. For instructions on how to build and use an Oracle database instead, see Changing the GenePattern Database (HSQL to Oracle).

We use an Oracle database for the GenePattern public server.

Securing the GenePattern Server

The following sections briefly summarize how to secure your GenePattern server, including access to the server from client machines, GenePattern user accounts, authentication (e.g., username & password) and authorization (e.g., permissions). For more detail, see Securing the Server.

Access

By default, any client machine can access a GenePattern server. Optionally, you can configure your GenePattern server to restrict access to selected domains. See Securing the Server.Access Filtering.

Access to the GenePattern public server is not restricted.

User Accounts

A user must have a GenePattern account to log into the GenePattern server. By default, when a user first logs into the server, GenePattern automatically create an account for that username.

To enable registration, in the genepattern.properties file, set require.password=true. This setting adds a registration link (and password prompt) to the GenePattern login page. The first time users log into GenePattern, they must click the registration link to create an account. User account information is stored in the GenePattern Database.

Alternatively, configure the GenePattern server to not allow users to create GenePattern accounts (create.account.allowed=false). In this case, new user accounts must be explicitly created by editing the GenePattern database.

See Securing the Server.Password Protection.

Registration (and passwords) are enabled on the GenePattern public server

Authentication

Each GenePattern user must register to access the GenePattern server. By default, GenePattern requires only a username for authentication. Optionally, you can configure the GenePattern server to require both a username and a password for authentication. See Securing the Server.Password Protection.

GenePattern user authentication is performed by a servlet filter installed in front of the GenePattern web application in its web.xml file. To provide additional or alternative authentication, implement an IAuthenticationPlugin.java interface or modify the servlet filter. See Securing the Server.User Authentication and Authorization.

The GenePattern public server hosted at the Broad Institute uses the username and password authentication provided by the GenePattern installation.

Collaborator

A large university uses Kerberos to provide username and password authentication for their network. They wrote their own servlet filter to have the GenePattern server also authenticate using Kerberos.

Permissions

GenePattern permissions are based on two configuration files:

GenePattern user authorization is performed by a servlet filter installed in front of the GenePattern web application in its web.xml file. By default, users are assigned permissions based on GenePattern groups. To have the authorization filter use external user groups rather than the userGroups.xml file, implement the IGroupMembershipPlugin.java interface. To provide additional or alternative authorization, modify the servlet filter. See Securing the Server.User Authentication and Authorization.

On the GenePattern public server, the following permissions are restricted to a small number of users in the Administrator group:

  • createModule - we restrict this to prevent malicious code on the server
  • createPublicPipeline - we restrict this to prevent proliferation of untested pipelines
  • adminJobs, AdminModules, adminPipelines, adminSuites - we restrict these to preserve privacy
  • adminServer - we restrict this to secure the server

SSL

For information on how to modify the GenePattern web application to run on a web server that is configured to use the HTTPS protocol, see Securing the Server.Secure Sockets Layer (SSL) Support.

The GenePattern public server is not running under SSL.

Other Security Considerations

We take the following additional steps to secure the machine running the GenePattern public server (these steps may not be necessary on less public servers):

Modules

This section discusses how to install, create, and manage modules.

Installing Modules

By default, you install modules, pipelines, and suites from the Broad repository. The module repository contains more than 100 modules and pipelines. Suites are stored in a separate suite repository. For instructions on how to install modules from the repository, see Managing Modules, Pipelines, and Suites.

The repository is updated regularly. We recommend checking for new modules on a weekly basis.

Create your own repository: Optionally, you can select an alternate repository from which to install modules, pipelines, and/or suites. See Repositories.

At the Broad, we maintain a development repository for modules in development and a production repository for released modules. Only the production repository is available from the GenePattern public server.

Creating Modules

For instructions on how to create modules, as well as a step-by-step tutorial for creating a module, see the Programmers Guide.

Running Modules in a Cluster

Queuing systems such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE) allow computational resources to be used effectively. If you have such a queuing system, you typically want the GenePattern server to use it. For instructions on how to configure the GenePattern server to use a queuing system, see Using a Queuing System.

As described in the instructions, you click Administration>Server Settings and use the Command Line Prefix page to define the command prefix that runs the module on the cluster. The instructions use the Default Command Prefix field of the Command Line Prefix page to define one command prefix for all modules, which sends all modules to one queue. You can use that same page to define unique command line prefixes for specific modules. This allows you to send different modules to different queues, which helps to address hardware and memory issues. For example, certain modules (such as SNPFileCreator or HierarchicalClustering) require significant amounts of RAM.

The script described in the instructions writes the LSF log file into the job results directory. To prevent GenePattern from displaying the LSF log files with the rest of the job results, edit the genepattern.properties file and set jobs.FilenameFilter=.lsf*.

The GenePattern public server uses two queues: one for most modules and one for modules that require large amounts of memory. Modules sent to the 'bigmem' queue are run on a cluster of large memory machines. LSF log files are hidden.

Managing Memory for Modules

In GenePattern, you manage memory for modules in one of two ways:

Modules that Require Extra Memory

The following modules frequently require additional memory:

On the GenePattern public server, these modules are sent to a cluster of large memory machines.

genepattern.properties

Most server configuration options are in the genepattern.properties file in the GenePattern resources directory. Most of the options in this file can be set through the GenePattern interface by clicking Administration>Server Settings. For descriptions of the options, see Modifying Server Settings.

The options listed in the following table can only be set by editing the genepattern.properties settings. We recommend editing the properties through the GenePattern user interface when possible.

require.password

create.account.allowed

See User Accounts

GenePatternURL

fqHostName

fullyQualifiedHostName

gpServerHostAddress

See the FAQ: How do I configure the GenePattern server on a machine with multiple IP addresses?

allow.input.file.paths=true

server.browse.file.system.root=/

See Other Security Considerations

input.file.mode=path

Determines how GenePattern handles network file paths:

  • path (default) leaves the network file in place
  • move copies the network file to the job directory on the GenePattern server before beginning the job and copies the file back to its original location after the job completes

soap.attachment.dir=../temp/attachments

Used for the GenePattern SOAP interface. Specify a temporary directory to be used for SOAP messages with attachments.

Web Service Interface

All GenePattern server functionality is available programmatically. There are two basic access methods:

<< Changing the GenePattern Database (HSQL to Oracle) Up Setting Up a New GenePattern AMI >>

Updated on August 22, 2013 07:32