Queuing systems such as the Load Sharing Facility (LSF) and the Sun Grid Engine (SGE) allow computational resources to be used effectively. If you have installed a queuing system, you can configure the GenePattern server to use it. On a heavily used server, using a queuing system to execute analysis jobs generally improves performance overall, especially for compute-intensive and long-running jobs; however, short jobs might take slightly longer because they must be dispatched to the queuing system.
There are two ways to configure GenePattern's interaction with your queuing system; either programmatically through the CommandExecutor Java API or through a command line prefix.
To use a queuing system with GenePattern:
Each step is described in detail below.
The full source for the Command Executor API is included here:
The required Java libraries come with your local install of GenePattern and can be found in the
The interface accepts requests to start and terminate jobs from the server. You will need to invoke a callback to the GP server when your job has completed.
Once you have implemented this interface, create a jar file to deploy to the GP server.
The jar file and all of the dependent libraries must be installed to
To configure your server to interact with your queuing system, you must edit the
config.yaml file. In a fresh install of GenePattern, there will be two .yaml files found in
config_example.yaml. It is highly recommended that you make a copy of
config_default.yaml and name it something like
config.yaml. This will give you a working copy of your configuration file, preserving the default and example versions for your future reference. Additionally this will prevent your working copy from getting overwritten during server upgrade.
config.file property in the
<GenePatternServer>/resources/genepattern.properties file to point to your new configuration file. By default, the property looks like this:
For this example, you would edit the property as follows:
Now, edit your working copy of the configuration file,
config.yaml. (The following code snippets come from the
a) Define an executor in the "executors" section. To do so add an item to the list of 'executors' in the yaml document.
b) Configure your server to use your executor.
c) Optionally, you can use the configuration file to override the default executor on a per module, group or user basis. The following example comes from per module section, more examples can be found in
About the .yaml configuration file: As of GenePattern 3.4.0, you use the .yaml configuration file only to configure GenePattern for use with a queuing system. As you work with the .yaml file, you may notice that it contains several properties that are also defined in the genepattern.properties file. To avoid confusion, leave them set to agree with the genepattern.properties file. GenePattern 3.4.0 reads these properties from the genepattern.properties file, not from the .yaml file. (In a future release, the genepattern.properties file may define the default server settings and the .yaml configuration file may define custom server settings.)
At this point, you have deployed your command executor, modified the .yaml configuration file to control its use, and modified the
<GenePatternServer>/resources/genepattern.properties file to point to the modified .yaml configuration file. Now, stop and restart the GenePattern server to reload the server configuration and begin to use the new command executor.
As you use GenePattern with the queuing system, you may find it useful to modify the configuration. The Administration>Server Settings>Job Configuration page provides several useful tools for controlling the internal GenePattern job queue and reloading the .yaml configuration file. Use this page to confirm which command executors are currently installed and the exact .yaml configuration file currently in use. If you make minor adjustments to the configuration file, such as overriding the command executor used for a module, group or user, you can use the Job Configuration page to reload the configuration file without restarting the GenePattern server. On the other hand, for major changes, such as adding a new command executor, we recommend restarting the server rather than simply reloading the configuration.
Before the 3.2.3 release of GenePattern (June 2010), the only way to connect to an external queuing system was to use the command line prefix. Although this option requires no Java programming and allows for configuration via a web page, it has significant drawbacks:
The drawbacks are a result of how the command line prefix works. Each new job requires a dedicated server process which waits for the job to complete. When a user terminates a job, the server process is terminated but the external process launched on the queuing system is not terminated. Similarly, when the GenePattern server shuts down, all server processes halt but the processes running on the external queuing system become orphaned. When the GenePattern server restarts, the jobs are not restarted; the user must restart any unfinished job from the beginning.
If you are using the CommandExecutor Interface, we recommend that you not use the command line prefix. The command line prefix is appended to the module command line before the job is executed by the CommandExecutor. To be more precise:
Although this is not the preferred method, you can still use the Command Line Prefix to connect to an external queuing system.
To use the Command Line Prefix to configure the GenePattern server to execute jobs using LSF or SGE:
GenePatternServer/resources/genepattern.properties, specifying the URL of your server. For example:
When you run a pipeline, the GenePattern server uses this URL to construct the links to the output files.
By default, the GenePatternURL property is not set. When you run a pipeline, the GenePattern server derives the URL at run time based on the current IP address of the host server. This is ideal for a user running on a laptop, where the IP address may change at startup. However, if you are using a queuing system, the derived URL is incorrect: it is based on the IP address of the queuing system server rather than the GenePattern server.
GenePatternServer/resources/genepattern.properties, to quote the <r_flags> options. For example:
R2.5=<java> -DR_suppress\=<R.suppress.messages.file> -DR_HOME\=<R2.5_HOME>
-Dr_flags\=\"<r_flags>\" -cp <run_r_path> RunR
Modify other similar properties (if any) that were added to support additional versions of R.
For example, if you are using LSF, modify the Command Line Prefix options as follows:
bsub -K -o lsf_log.txt
Another alternative is to create a script that sets the environment variables and then executes the job using LSF or SGE. The command prefix would then execute the script. For example: