How Batching works in GenePattern


GenePattern allows you to submit a batch of jobs from the job input form. To enable parameter batching, click the batch checkbox next to one or more input parameters. A batch job is created for each value specified for a batched parameter.

Batching of a Single Parameter

Here is an example of batching over the column distance measure parameter in the HierarchicalClustering module. In this example three values Pearson correlation, Uncentered correlation and Euclidean distance are specified for the column distance measure parameter:

There will be three batch jobs created:

Parameter Name Batch Job #1 Batch Job #2 Batch Job #3
column distance measure Pearson correlation Uncentered correlation Euclidean distance

 

Batching of Multiple Parameters

In the case of batching over multiple batch parameters the values specified for each batch parameter is paired. The pairing of values for batch parameter is based on the order provided on the submit job page except when the values provided are directories.

Here is an example of batching over the chromosome and chromosome sequence file parameters in the Scripture module. In this example two values (chr2 and chr8) are specified to the chromosome parameter. Since batch parameters are paired two values must also be specified for the chromosome sequence file and the values are chr2.fasta and chr8.fasta:

 

There will be two batch jobs created with the following values set:

Parameter Name Batch Job #1 Batch Job #2
chromosome chr2 chr8
chromosome sequence file chr2.fasta chr8.fasta

 

Batching of Directories

For directories the pairing of files is done using the base common name (file name without the extension) of the files.

For example given the following:

A. Four directories gct_files_set1/, gct_files_set2/,  cls_files_set1/and cls_files_set2/ containing the following files:

  • gct_files_set1/all_aml_test.gct
  • gct_files_set1/all_aml_train.gct
  • gct_files_set2/tumor.gct
  • gct_files_set2/normal.gct
  • cls_files_set1/all_aml_test.cls
  • cls_files_set1/all_aml_train.cls
  • cls_files_set2/tumor.cls
  • cls_files_set2/new.cls

B. Here is an example of batching the input file and cls file parameters with directory inputs in the ComparativeMarkerSelection module:


There will be three batch jobs created with the following values set:

Parameter Name Batch Job #1 Batch Job #1 Batch Job #3
input file gct_files_set1/all_aml_test.gct gct_files_set1/all_aml_train.gct gct_files_set2/tumor.gct
cls file cls_files_set1/all_aml_test.cls gct_files_set1/all_aml_train.cls cls_files_set2/tumor.cls

 

Notice that the normals.gct and new.cls files were ignored since there were no other files with matching base names (i.e normals.cls or new.gct).

 

NOTE: Batching of both non file parameters and file parameters where any of the values specified is a directory is not supported. However, batching of only file parameters where one of the values is a directory is supported.