Interactive GUI-based K-Means clustering of gene expression profiles.
• Input: GCT (can add RES support if needed).
• Input: CLS (optional) – if provided, it will cluster on the class-means values instead.
• User can select which samples within a GCT they would like to cluster on, and drag/drop them to rearrange their ordering.
• User can pre-filter probesets based on minimum differentiation and/or minimum expression. This is carried out prior to clustering so that the user knows how many probes they are starting with (allowing for more appropriate determination of K).
• Multiple parameter settings for distance metric, signal transformation.
• Real-time convergence plots: users can see how quickly their clusters have converged (very useful for identifying over-fitting and determining the most optimal starting parameters).
• Ability to specify desired centroids (in addition to random) via standard probe ID list (useful for clustering around genes of interest). A priori centroids are fixed and are not recomputed over successive iterations.
• Clusters can be ordered using a number of different criteria: number of probes, mean correlation to centroids, cluster name, variance of the centroid, or based on how similar they are to a particular cluster. The default is variance.
• Ability to merge or split (deterministically) clusters. Modified clusters appear in a different color.
• Interactive highlighting of specific probes within a cluster.
• Alternate “relative”/”global” scaling for expression profiles.
• Dynamic heatmaps for each cluster. Color scale reflects either the relative or global transformed signal intensity, or the original expression values as found in the GCT. Several pre-defined color schemes to choose from.
• Search – can search for a probe ID or gene symbol (if that is in the ‘description’ column of GCT). Any clusters containing a search term are automatically highlighted, and the specific matching probes within each clusters are automatically selected.
• Saving images – Users can copy any cluster plot or heatmap to the clipboard, or save to disk.
• Batch export – Users can batch export cluster plots, probe ID lists, GCTs, or a PDF containing a matrix of plots for the selected clusters. • Multiple GCT export options. For each cluster, can export; clustered samples, the clustered class-means, all samples, or all class means (class-means option only available if a CLS has been specified).
• Users can optionally compress all exported files to a single ZIP file.