Running Modules and Pipelines  Print-icon

An analysis module runs a single analysis. A pipeline runs a series of analysis modules. If you are unfamiliar with GenePattern modules and pipelines, see Concepts.

Running a Module or Pipeline

To run a module or pipeline:

  1. Start typing the module name in the search box at the top of the Modules & Pipelines pane to quickly find the module you need.
  2. Select the desired module. The parameters appear in the center pane.
  3. Enter values for the parameter fields (see Setting Parameters).
  4. Click Run. GenePattern sends the job to the server and displays the Job Status page. How long a job takes to complete depends on the size of your dataset and the analysis that you are running.
    Tip: You do not have to wait for the job to complete. You can move off of the Job Status page and continue working or even log out of GenePattern.
  5. Click Return to Modules & Pipelines Start to return to the GenePattern home page. The Recent Jobs tab shows the job that you just ran.
  6. Click the ID number of that job to redisplay its Job Status page. The Job Status page provides complete information about the job, including its parameters, input files, output files, and current status (see Working with Analysis Results).

Alternative: A module or pipeline can also be run from an analysis result file or uploaded file. When GenePattern displays such files (for example, in the Recent Jobs tab, the Uploads tab, or the Job Status page), click the menu icon icon next to the file of interest, as shown below. GenePattern displays the file menu, which lists all modules that accept this type of file as an input file parameter. Select the module or pipeline. GenePattern displays its parameters in the center pane, setting the input file parameter to the file of interest.

Licensed Modules and Pipelines

Some modules or pipelines may have associated end-user license agreements (EULAs).  When you first run a licensed module/pipeline, you will be shown a window containing the license terms, where you can read the terms and choose to accept them or not.  (Note: The license terms will differ depending on the module. The terms shown in this image are only an example.)

When you click OK to accept the license agreement, your acceptance is logged in a database maintained at the Broad.  You will not be asked to accept the license terms when you use that particular module/pipeline again until and unless a new version of the module/pipeline is released.

If you choose not to accept the license agreement, you will be unable to run the licensed module or pipeline.

To view a license you have already accepted, click the properties link on the licensed module or pipeline page.  The modules properties page contains a link to the full-text license.

On the pipeline properties page, the link appears above the pipeline listing.

Setting Parameters

When you select a module or pipeline, GenePattern displays its parameters:

1 Most modules require one or more input files. There are several ways to choose an input file:
  • Drag and drop a file. You can drag a file from your desktop or from the Recent Jobs, Uploads, or GenomeSpace tabs and drop it on the parameter.  This option cannot be used for extremely large files (2 GB or more).
    GenePattern uploads the file to the server before running the analysis. The file is stored with the analysis results.
  • Upload a file. Click the Upload File button.  Navigate to the desired file and select it. This option cannot be used for extremely large files (2 GB or more).
    GenePattern uploads the file to the server before running the analysis. The file is stored with the analysis results.
  • Use a previously uploaded file. From the Uploads pane, click the icon next to an uploaded file of the desired type and select Send To parameter-name. For more information about uploading files, see Uploading Files.
    GenePattern uses the file already stored in the upload directory. A pointer to the file is stored with the analysis results.
  • Use a result file. From the Recent Jobs pane, click the icon next to an analysis result file of the desired type and select Send To parameter-name. This has the benefit of connecting this analysis to the previous analysis, which can be useful for creating pipelines from an analysis result file.
    GenePattern uses the analysis result file stored with the previous job. A pointer to the file is stored with the analysis results.
  • Use a file URL. Click the Add Path or URL button. Copy the URL or FTP address of the file to the entry field.
    GenePattern uploads the file to the server before running the analysis. When the job completes, it deletes the file and stores the URL with the analysis results. (During file transfers, the analysis job is neither PENDING nor RUNNING; deleting a job while it is in this state does not cancel the file transfer.)
  • Use a file path. Click the Add Path or URL button. Navigate to the file under the Or select a file from the file system... header. Note: For security reasons, file paths are disabled on the GenePattern public server; they can be enabled on a local server. For more information, see Using File Paths.
    GenePattern uses the file identified by the file path. The file path is stored with the analysis results.
  • Use a directory. Select a directory of files to run the same analysis on several files and select the Batch checkbox. For more information, see Batch Processing.

In general, uploading a file using drag-and-drop or the Upload File button is fine. However, if you are focused on a particular dataset, it may be faster to upload your files to the Uploads tab and then analyze the uploaded files. Just be aware that if you delete the uploaded files, you cannot rerun the analyses. If you have extremely large datasets, consider using file paths. You can save a significant amount of time by avoiding file transfers.

2 3

Specify other parameter values using the drop-down lists and entry fields:

  • Drop-down list. Click a value in the list. Ctrl-click (Mac: Command-click) to select multiple values (if allowed).
  • Entry field. Enter a value in the box. Valid values for the field depend on the module and should be listed in the module documentation.
4 Version of the module. If multiple versions of the module are installed on the server, GenePattern displays the latest version by default. Select a different version from the version number drop-down menu.
5

The Documentation link displays the module or pipeline documentation.

The Gear icon drop-down contains the following links:

  • Properties: Display the properties page of the module or pipeline. The module properties page lists the program that implements the analysis. The pipeline properties page lists the modules run by the pipeline.
  • Export: Create a zip file that contains the module or pipeline. The zip file can be used to install this module or pipeline on another GenePattern server.
  • Display the code (Java, MATLAB, or R) used to run the module with the parameters that you have entered. This can be a useful starting point for programmers writing code to invoke the module.
  • Show Parameter Descriptions checkbox displays or hides the parameter descriptions; selected by default.
  • Edit (not shown): Available only for modules that you have created.
6
  • Run button: Start the analysis.
  • Reset button: Reset all parameters to their default values.

Rerunning an Analysis

To rerun an analysis:

  1. Display your analysis jobs in one of two ways:
    • Click Modules & Pipelines to display the GenePattern home page, where your most recent jobs are listed in the Recent Jobs tab.
    • Click Job Results>Results Summary to display the Job Results Summary page, which lists all of your analysis jobs.
  2. Click the icon next to the job that you want to rerun and select Reload. The parameters appear in the center pane set to the values that were used for this job.
  3. Optionally, modify the parameter values.
  4. Click Run.

Uploading Files

See the video tutorial: Using the GenePattern Uploads Tab

Uploading files to the GenePattern server provides the following benefits:

To upload files:

  1. Click the Uploads tab to bring it forward.
  2. Click the icon next to the Uploads directory. GenePattern displays the uploads menu:
  3. Optionally, create a subdirectory:
    1. Enter a subdirectory name and click Create.
    2. Click the icon next to the subdirectory. GenePattern displays the uploads menu:
  4. Click Upload Files and, if necessary, grant the upload applet access to run on your local machine. GenePattern opens a Java applet, as shown below, to upload the files.
  5. Click to browse for the files to upload or drag-and-drop the files into the window.
    Tip: The order in which you add the files to the window determines the order in which the applet uploads the files to the server.
  6. Click the  upload button to begin uploading the files. You can continue working in the applet or in GenePattern while the applet uploads the files to the server.
    Do not close the applet window; it must remain open to complete the upload. The applet displays its progress and estimated time left as it proceeds. Upload time depends on the size of your file, the speed of your network connection, bandwidth available and a variety of other factors. For example, on a T1 connection a 1GB file can take as little as 5-10 minutes to upload, but that same file over a WiFi VPN connection in a room full of other WiFi users can take an hour or more.
  7. When the upload completes, refresh the GenePattern web browser page. The Uploads tab displays an up-to-date listing of the files in your uploads directory.

Add files to the applet window. Click the icon to display a file browser from which you can select files. Alternatively, drag-and-drop files directly onto the window.

Remove files from the applet window. Click the icon to display a menu from which you can choose to delete selected, pending, finished, failed, or all files from the window.

Pending. When you have added files to the window, but have not yet clicked Upload, the files are pending.

Queued. When you click Upload, all pending files are added to the applet's upload queue.

Finished. Files that have been successfully uploaded to the GenePattern server are marked as finished.

Failed. Files that could not be uploaded to the GenePattern server are marked as failed.

Upload. Start uploading all pending files.

Deleting Uploaded Files

To delete an uploaded file from the server:

  1. Click the Uploads tab to bring it forward.
  2. Click the icon next to a file or subdirectory. GenePattern displays the uploads menu.
  3. Click Delete. GenePattern deletes the file or subdirectory from your upload directory on the server.
    • You must delete the files in a subdirectory before you can delete the subdirectory.
    • When you use an uploaded file to run an analysis or as part of a pipeline, the job or pipeline includes a pointer to the file. If you delete the file from your uploads directory, jobs and pipelines that reference the file cannot access the file and, therefore, cannot be (re)run.

Special Considerations

Using File Paths

Note: For security reasons, file paths are not enabled on the GenePattern public server.

When file paths are enabled on a GenePattern server, you can use file paths to identify input files for modules and pipelines. The GenePattern server can directly access the files stored on your local or networked drives; therefore, the files do not have to be transferred to or stored on the GenePattern server. Avoiding file transfers can save significant upload time and avoiding file storage can save significant amounts of disk space. Therefore:

When file paths are enabled, the module/pipeline run page includes the Add Path or URL option. To use a file path as an input file parameter:

  1. Select a module or pipeline. The parameters appear in the center pane.
  2. Click the Add Path or URL button:
  3. Select a file from a local or networked drive and click Select.
  4. GenePattern runs the analysis, using the file identified by the file path. The file path, rather than the file, is stored with the analysis results.

To enable file paths on your GenePattern server:

  1. Open the genepattern.properties file, which is located in the  resources directory under your GenePattern server directory.
  2. By default, the allow.input.file.paths property is set to false. Set it to true:
    allow.input.file.paths=true
  3. Save the updated genepattern.properties file.
  4. Restart your GenePattern server.

Optionally, you can also set the server.browse.file.system.root property to define a root directory, where the server begins browsing for network files. For example, if you set:

server.browse.system.root=/Users/mydata/ngs

when the user clicks Add Path or URL to select a file from a local or networked drive, GenePattern opens the file selection window to /Users/mydata/ngs.

Batch Processing

See the video tutorial: Batch Execution in GenePattern 3.3.3

Batch processing provides an automated method of running several files through a module or pipeline in parallel.

In GenePattern 3.8.0 you can drag in a list of files to batch over, or run over a directory, as before. For more information on dragged in lists, please see the 3.8.0 release notes.

To run a batch job from a directory in the Files tab:

  1. Place the files to be processed in a subdirectory. You can either upload the files into a subdirectory on the GenePattern server or use a file path to point to a subdirectory on your local machine. See Using an Upload Subdirectory or Using a File Path Subdirectory.
  2. Select the module or pipeline to run.
  3. Select the Batch checkbox.
  4. Specify the subdirectory as an input file parameter for the module or pipeline. See Using an Upload Subdirectory or Using a File Path Subdirectory.

    To provide multiple input files for a module or pipeline, identify each set of files to be processed by giving them the same name (excluding file extension). For example, to run ComparativeMarkerSelection as a batch job, give each .gct or .res file the same name (excluding file extension) as its matching .cls file. For example: test1.gct, test1.cls, test2.gct, test2.cls. You can place all of the files in one subdirectory or create a separate subdirectory for each input file parameter (for example, one subdirectory for .gct files and another for .cls files). GenePattern processes each set of matching files, ignoring any other files in the directory.

  5. Click Run. GenePattern starts one analysis job for each input file or valid set of input files. It then displays all of the jobs in the Job Results Summary page.

    Tip: You do not have to wait for the job to complete. You can move off of the Job Results Summary page and continue working or even log out of GenePattern. If you choose to wait for the jobs to complete, periodically refresh the Job Results Summary page to update the status of the jobs.

  6. Click Modules & Pipelines to return to the GenePattern home page. The Jobs tab shows the submitted jobs.
    • Click the id number of a job to display its Job Status page. The Job Status page provides complete information about the job, including its parameters, input files, output files, and current status.
    • Click Job Results>Results Summary to redisplay the Job Results Summary page. On the Job Results Summary page, select the batch job from the drop-down list to list only those analysis jobs submitted as part of this batch job.

Using an Upload Subdirectory

To create and run a batch job using an upload subdirectory:

  1. Upload the files that you want to run as a batch into a subdirectory of the Uploads directory. For more information, see Uploading Files.
  2. Select the module or pipeline to run. GenePattern displays the module run page.
  3. Check the Batch box for the input file you wish to provide a set of files for.
  4. For each input file parameter whose input files are in the subdirectory, simply the subdirectory from the Files tab to the input file parameter.
  5. Enter the remaining parameter values.
  6. Click Run.

Using a File Path Subdirectory

To create and run a batch job using a file path subdirectory:

  1. Install a local GenePattern server and enable the server to accept file paths. For more information, see Using File Paths.
  2. Verify that the files that you want to run as a batch are in a subdirectory accessible to your local GenePattern server.
  3. Start your local server, open a web browser, and log into GenePattern.
  4. Select the module or pipeline to run. GenePattern displays the module run page.
  5. Check the Batch box for the input file you wish to provide a set of files for.
  6. For each input file parameter whose input files are in the subdirectory:
    1. Click the Add Path or URL button.
    2. Enter the file path of the subdirectory.
  7. Enter the remaining parameter values.
  8. Click Run.

Special Considerations

<< User Interface Up Working with Analysis Results >>

Updated on January 31, 2014 10:10