When you create or edit a module, GenePattern displays its properties in the module integrator. Click the Help icon to display the following descriptions of each property and its valid values.
Note: Only the GenePattern team can create, edit or install modules on the GenePattern public server. Therefore, to create a module, you must have a local GenePattern server installed.
Creating a GenePattern module is a multi-step process:
When you save your changes, the module properties that you have entered are validated as follows:
If everything checks out, the uploaded files are saved in the GenePattern module library and the module registered in the module database. The module and its uploaded files are indexed in the background so that they are available for searching. You can run the module immediately and can share it with others.
The following sections describe each module property in detail:
An example for each property is given based on the Consensus Clustering module, which may be uploaded from the module repository if you haven't already installed it.
The name of the module will be used in the drop-down module catalog lists and as a directory name on the server with this name. It should be a short but descriptive name, without spaces or punctuation, and may be mixed upper- and lower-case.
ConsensusClustering example: ConsensusClustering
Each time you update a module, you create a new version of the module. Typically, you want to edit the most recent version of a module. If you want to edit an earlier version, select that version from the drop-down list of versions.
Click Help to display this text.
Click Save to save your changes, creating a new version of the module, and remain in the module integrator.
Click Save and Run to save your changes, creating a new version of the module, exit from the module integrator and run the module.
The Life Science Identifier (LSID) used to uniquely identify a GenePattern module. You cannot create or edit LSIDS. They are created automatically by the GenePattern server when a module is saved.
ConsensusClustering example: urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00030:5
The description is where to explain what your module does, and why someone would want to use it. It can be anywhere from a sentence to a short paragraph in length. The description, sometimes in abridged form, is displayed in the pipeline designer module choice list, in generated code when creating scripts from pipelines, and in the web client. It's a very good way for you to document succinctly why your module exists.
ConsensusClustering example: Resampling-based clustering method
Enter the author's name. If you share this module with others, they will know how to give the author credit and whom to contact with questions, suggestions, or enhancement ideas.
ConsensusClustering example: Stefano Monti
Enter the author's affiliation (company or academic institution). If you share this module with others, they will know how to give the author credit and whom to contact with questions, suggestions, or enhancement ideas.
ConsensusClustering example: Broad Institute
Upload a text file containing the End-User license agreement. Users will be prompted to accept this license before running the module.
Enter a brief description of the changes that you have made to the module. When GenePattern clients display a drop-down list of versions, the comments for each version are visible in the drop-down list.
ConsensusClustering example: Added ability to create heatmap images of clusters
On the GenePattern home page, modules and pipelines are organized by categories. Pipelines are always assigned to the category name pipeline. When you create/update a module, you can choose an existing category name or create a new category name. If your module fits into an existing category, such as Preprocess & Utilities, select that category from the drop-down list; otherwise, click the New button to add a new category. GenePattern creates the drop-down list of categories dynamically based on the categories of the modules installed on your GenePattern server. If you delete the last module in a given category, that category is removed from the drop-down list. ConsensusClustering example: Clustering
Modules may be marked as either public or private. When a module is first created, the default is to mark it private. When a module is first created, the default is to mark it private.
The quality level is a simple three-level classification that lets the user know what level of confidence the author has in the robustness of the module. In increasing order of quality expectations, they are: are "development", "preproduction", and "production". Although these terms have no strict definitions, they are useful for setting user expectations. If you make this module public, set the quality level appropriately.
ConsensusClustering example: production
If your module requires a specific operating system (Windows, Linux, MacOS, etc.), indicate that here. Operating system requirements are enforced when the module is run.
ConsensusClustering example: any
There is no specific language support or requirement enforcement at this time. However, by describing the primary language that a module is implemented in, you give some hints to the prospective user about their system requirements.
ConsensusClustering example: Java
If your module requires at least a certain revision of the language runtime environment(eg. 1.3.1_07), indicate that here. This is not currently enforced, but provides useful information to the prospective module user.
ConsensusClustering example: none specified
Select the file formats of the output files generated by your module. If your module generates an output file format not included in the list, click New to add that format to the list.
Any files required by your module, such as scripts, libraries, property files, DLLs, executable programs, etc. must be uploaded to the server. These files may be referenced in the command line field using the <libdir>filename nomenclature. There is no upper limit on the number of files which may be uploaded, assuming there is enough space.
Files that have been uploaded appear as links in this section. You may view or download them by clicking appropriately in your browser.
Help Files: Public modules should always include a help file that provides instructions for using the module, a detailed description of each input parameter, a detailed description of each output file (both its format and content), and either an explanation of the algorithm or a reference to the paper, journal or book that explains it.
When a user selects your module, GenePattern displays a form that includes the module parameters and a Help button. When the user clicks the Help button, GenePattern examines the list of support files for the module and displays the first file that has a standard documentation file extension. If no documentation file was provided, GenePattern displays a message indicating that no information is available. (By default, the standard documentation file extensions are html, htm, xhtml, pdf, rtf, and txt. You can modify this list of extensions by editing the files.doc property in the GenePattern /resources/genepattern.properties file.)
ConsensusClustering example: Current files: Acme.jar archiver.jar common_cmdline.jar ConsensusClustering.pdf file_support.jar geneweaver.jar gp-common.jar ineq_0.2-2.tar.gz ineq_0.2-2.tgz jaxb-rt-1.0-ea.jar my.local.install.r RunSomAlg.jar trove.jar version.txt
The crux of adding a module to the GenePattern server is to provide the command line that will be used to launch the module, including substitutions for settings that will be specified differently for each invocation. In the command line field, you will provide a combination of the fixed text and the dynamically-changed text which together constitute the command line for an invocation of the module.
Perhaps the trickiest thing about specifying a command line is making it truly platform-independent. Sure, it works fine for your computer, right now. But if you zip it and send it to an associate, are they running a Mac? Windows? Unix? You may not know, and you shouldn't need to care. By carefully describing the command line using substitution variables, you can pretty well ensure that your module will run anywhere.
Parameters: Parameters that require substitution should be enclosed in brackets (ie. <filename>). Every parameter listed in the parameters section must be mentioned in the command line unless its optional field is checked. A default value may be provided and will be used if the user fails to specify a value when invoking the module.
Click the View Argument List button to display a list of the parameters mentioned in the command line. You can change the order of the parameters by dragging them to a new position in the list or by editing the text of the command line.
Substitution properties: In addition to parameter names, you may also use environment variables, Java system properties, and any properties defined in the %GenePatternInstallDir%/resources/genepattern.properties file. In particular, there are predefined values for <java>, <perl>, and <R>, three languages that are used within various modules that may be downloaded from the module catalog at the public GenePattern website. Useful substitution properties include:
|<java>||path to Java, the same one running the GenePattern server|
|<perl>||path to Perl, installed with GenePattern server on Windows, otherwise the one already installed on your system|
|<R>||path to a program that runs R and takes as input a script of R commands. R is installed with GenePatternserver on Windows and MacOS|
|<java_flags>||memory size and other Java JVM settings from the GenePattern/resources/genepattern.properties file|
|<libdir>||directory where the module's support files are stored|
|<name>||name of the module being run|
|<filename_basename>||for each input file parameter, the filename without directory|
|<filename_extension>||for each input file parameter, the extension without filename or directory|
|<filename_file>||for each input file parameter, the input filename without directory|
|<path.separator>||Java classpath delimiters (: or ;), useful for specifying a classpath for Java-based modules|
|<file.separator>||/ or \ for directory delimiter|
|<line.separator>||newline, carriage return, or both for line endings|
|<user.dir>||current directory where the job is executing|
|<user.home>||user's home directory|
Rather than having to customize your module's command line for the exact location of the language runtime on each computer, you can use the substitution properties. For example,
<java> -cp <libdir>mymodule.jar com.foo.MyModule <arg1>
GenePattern will then take care of locating the Java runtime, asking it to begin execution at the MyModule class using code from the uploaded file mymodule.jar.
Standard input/output: If your module is designed to accept a standard input stream and/or write to a standard output stream, you can use redirection syntax when describing the command line. To redirect a file to the input stream, enter the text \< followed by the input file parameter. To redirect the standard output or standard error streams to a named file, enter the text \> or \\>& followed by the name of the output file. In the following example, the LogTransform module reads its input from the standard input stream and writes its output to the standard output stream:
<perl> <libdir>log_transform.pl \< <input.filename> \> <output.file>
ConsensusClustering example (actually all on one line):
<java> <java_flags> -DR_HOME=<R_HOME> -cp <libdir>geneweaver.jar edu.mit.wi.genome.geneweaver.clustering.ConsensusClustering <input.filename> <kmax> <niter> <normalize.type> -N <norm.iter> -S <resample> -t <algo> -L <merge.type> -i <descent.iter> -o <out.stub> -s -d <create.heat.map> -z <heat.map.size> -l1 -v
The input parameters section of the form appears perhaps to be the most daunting. And yet there is little that is required to make a working module declaration. Each parameter in the command line that comes from a user input must have an entry in this section. Otherwise the clients would know nothing about how to prompt the user for input nor could they explain to the user what type of input is expected.
To add one or more parameters, enter the number of parameters to add and click the Add Parameter button.
Each parameter has a name, which can be whatever you like, using letters, numbers, and period as a separator character between "words". It can be of mixed upper- and lower-case. The name is used inside <brackets> within the command line to indicate that the value of that variable should be substituted at that position within the command line. The name is also used as a label within the web client to prompt the user for the value for that field. And the name is used as a way of identifying which parameter is which for the scripting clients.
ConsensusClustering examples: kmax, input.filename
The description field is optional, but is very useful. It allows the module author to provide a more detailed description than the name itself. What is the "kmax" parameter used for? Does it interact with any other parameters? Do you have any advice about what is a reasonable range of settings for it? The description is displayed by the GenePattern clients when they prompt for input for each field.
ConsensusClustering example: Type of clustering algorithm
Some parameters should have a default value which will be supplied on the module's command line if no setting is supplied by the user when invoking the module. This is not the same as the defaults defined in the program invoked by the module. Instead, this allows the module author to create a default, even when none exists in the program being invoked by the module.
The default value may use substitution variables, just like the rest of the command line. So a valid default for an output file might be <input.filename_basename>.foo, meaning that the output file will have the same stem as the input.filename parameter, but will have a .foo extension.
Default values for parameters that have a choice list must be either blank or one of the values from the choice list. Any other setting will result in an error message. If no default for a choice list is provided, the first entry on the list will be the default.
ConsensusClustering examples: NMF, 5, <input.filename_basename>
Some parameters need to have extra text prefixing them on the command line when they are specified. For example, you might need to write "-F filename" to pass in a filename. The prefix text "-F" or "-F " would be specified here. To insert a space between the flag and the parameter, add the space to the prefix text.
example (with space): -F inputfile
example (without space): -Finputfile
Declaration of the type of an input parameter allows the client to make a smarter presentation of the input to the user. (As of GenePattern 1.2, all parameters are being treated as either text or input file types). Parameter type choices are:
When you select a parameter type of input file, a drop-down list of file formats appears in the file format column. Select the valid file format(s) for this parameter. If your module requires an input file format not included in the list, scroll back to the Output Description field and click New to add that format to the list. For this type of parameter, when the user enters the name of the file, the GenePattern clients pass along the entire file rather than just the file name.
Some parameters are best represented as a drop-down list of choices. By constraining input to those from the list, the user is saved typing and cannot make a mistake by choosing an invalid setting (unless there is a dependency on some other parameter). To enter the choices, click the Edit Choices link and enter the choices in the Edit Choice List window.
For each choice enter the value required by the program (Value) and, optionally, a more human-readable value (Display Value). When you exit from the Edit Choice List window, the choices you entered are displayed as a semi-colon delimited set of choices. For example:
hierarchical=Hierarchical clustering;SOM=Self-organizing map;NMF=Non-negative Matrix Factorization;3.14159265=pi