Flow Cytometry Gating and Clustering
Description
Gating is an inherent component of FCM data analysis; it is a process where particles (i.e., cells) are subsetted according to physical and fluorescence characteristics. These properties are reflected in parameter values of events stored in FCS files. In practice, gating corresponds to assigning classes (labels) to these events. This can be done either manually or automatically. While manual gating is still dominant in traditional FCM, automatic gating methods are becoming more important in contemporary and high throughput approaches. This suite supports the application of manually created gates saved in GatingML as well as a variety of clustering algorithms developed for the use with flow cytometry data.
(Click for documents)
Manual gating

ApplyGatingML applies a GatingML file to FCS data to gate (filter) and/or transform it. Each gate in the GatingML file creates a population saved in a CSV file.
Clustering

FlowClustClassifyFCS uses the modelbased FlowClust algorithm to find populations in an FCS file.

FlowMeansCluster clusters flow cytometry data using the FlowMeans algorithm. This algorithm applies a nonparametric approach to perform automated gating of cell populations in flow cytometry data. Clustering results are obtained by counting the number of modes in every single dimension, followed by multidimensional clustering. Then adjacent clusters (in terms of Euclidean or Mahalanobis distance) are merged. The number of clusters is determined using a change point detection algorithm based on piecewise linear regression. This approach allows multiple clusters to represent the same population and enables the algorithm to find nonspherical cell populations.

FlowMergeCluster uses the FlowMerge algorithm to cluster flow cytometry data. The max BIC model fitting criterion for mixture models generally overestimates the number of cell populations in flow cytometry data because the number of mixture components required to accurately model a distribution is usually greater than the number of distinct cell populations. Model fitting criteria based on the entropy, such as the ICL, provide better estimates of the number of clusters but tend to provide a poor fit to the underlying distribution. FlowMerge combines these two approaches by merging mixture components from the max BIC fit based on an entropy criterion. This approach allows multiple mixture components to represent the same cell subpopulation. Merged clusters are mixtures themselves and are summarized by a weighted combination of their component model parameters. The result is a mixture model that retains the good model fitting properties of the max BIC solution but the number of components more closely reflects the true number of distinct cell subpopulations.

ImmPortFLOCK is a GenePattern implementation of FLOCK version 1 method at the Immunology Database and Analysis Portal (ImmPort). FLOCK is a gridbased density clustering method for automated population identification from multidimensional FCM data. It has been used to objectively identify seventeen distinct B cell subsets in a human peripheral blood sample and to identify and quantify novel plasmablast subsets responding transiently to tetanus vaccinations and other vaccinations in peripheral blood. The use of algorithms like FLOCK for FCM data analysis obviates the need for subjective and labor intensive manual gating to identify and quantify cell subsets. Novel populations identified by these computational approaches can serve as hypotheses for further experimental study.

KMeansClassifyFCS uses the KMeans algorithm to find populations in an FCS file.

SamSPECTRALClusterFCS uses the SamSPECTRAL algorithm, which is a modification of spectral clustering, to cluster flow cytometry data. Spectral clustering is a nonparametric clustering method which has proved useful in many pattern recognition areas. It does not require a priori assumptions on the size, shape, and distributions of clusters and it is not sensitive to outliers or noise. However, there are serious empirical barriers in applying spectral clustering for large data sets, such as commonly present in flow cytometry.

SuggestNumberOfPopulationsFCS suggests the number of clusters (K value) for a clustering algorithm. It tests a range of several K values and suggests a number of populations using the Bayesian Information Criterion (BIC) in combination with the FlowClust algorithm. In addition, it outputs the estimated BIC as well as the Integrated Completed Likelihood (ICL) score for each of the K values
Metaclustering

ImmPortCrossSample applies centroids of an FCM data clustering result to a new FCM data file (TXT generated from FCS) for population identification and mapping between the two samples. The module is a GenePattern implementation of the crosssample comparison method at the Immunology Database and Analysis Portal (ImmPort).

MClustClusterLabel performs a modelbased labeling (matching) of clustered flow cytometry data. Independent clustering of a several flow cytometry samples (e.g., blood from different patients) typically results in dividing each of the FCS files into several groups corresponding to cell sub populations in each of the particular sample. This module will match these clustering results across different samples.

MClustClusterLabelBIC uses the Bayesian Information Criterion (BIC) for modelbased labeling of clusters, i.e., it estimates the number of labels for MClustClusterLabel.
Feature extraction

FCMFeatureExtraction extracts features from gated flow cytometry data. These features typically include the number of events (cells) in each of the subpopulation, the proportion (percentage) of events in each of the subpopulation, and/or the mean value of (each or selected) parameters (e.g., the Mean Fluorescence Intensity [MFI]) of each of the subpopulation.
References

FCS: Spidlen J, Moore W, Parks D, Goldberg M, Bray C, Bierre P, Gorombey P, Hyun B, Hubbard M, Lange S, Lefebvre R, Leif RR, Novo D, Ostruszka L, Treister A, Wood J, Murphy RF, Roederer M, Sudar D, Zigon R, Brinkman RR. Data File Standard for Flow Cytometry, Version FCS 3.1. Cytometry A. 2010;77:97–100.

GatingML: Spidlen J, Leif RC, Moore W, Roederer M, International Society for the Advancement of Cytometry Data Standards Task Force, Brinkman RR. GatingML: XMLbased gating descriptions in flow cytometry. Cytometry A. 2008;73A(12):11511157.

FlowClust: Lo K, Brinkman RR, Gottardo R. Automated gating of flow cytometry data via robust modelbased clustering. Cytometry A. 2008;73(4):321332.

FlowMeans: Aghaeepour N, Nikolic R, Hoos HH, Brinkman RR. Rapid cell population identification in flow cytometry data. Cytometry A. 2011;79(1):613.

FlowMerge: Finak G, Bashashati A, Brinkman R, Gottardo R. Merging mixture components for cell population identification in flow cytometry. Adv Bioinformatics. 2009;247646. Epub 2009 Nov 12.

ImmPortFLOCK: Qian Y, Wei C, EunHyung Lee F, Campbell J, Halliley J, Lee JA, Cai J, Kong YM, Sadat E, Thomson E, Dunn P, Seegmiller AC, Karandikar NJ, Tipton CM, Mosmann T, Sanz I, Scheuermann RH. Elucidation of seventeen human peripheral blood Bcell subsets and quantification of the tetanus response using a densitybased method for the automated identification of cell populations in multidimensional flow cytometry data. Cytometry B Clin Cytom. 2010;78 Suppl 1:S6982.

SamSPECTRAL: Zare H, Shooshtari P, Gupta A, Brinkman RR. Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinformatics. 2010 Jul 28;11:403.

Quick Start a 10minute introduction to GenePattern
Updated on October 16, 2012 14:04