Multiple reaction monitoring-mass spectrometry (MRM-MS) of peptides with stable isotope-labeled internal standards (SIS) is a quantitative assay for measuring proteins in complex biological matrices. These assays can be highly precise and quantitative, but the frequent occurrence of interferences require that MRM-MS data be manually reviewed by an expert. The AuDIT module implements an algorithm that, in an automated manner, identifies inaccurate transition data based on the presence of interfering signal or inconsistent recovery between replicate samples. AuDIT reduces the time required for manual, subjective inspection of data, improves the overall accuracy of data analysis, and is easily implemented into the standard data analysis workflow.
Development of targeted MS-based assays for detecting and quantifying changes in protein levels requires identification of peptides to use as quantitative surrogates for each candidate protein. The enhanced signature peptide predictor (ESPPredictor module) provides a means of predicting such 'signature peptides' from sequence alone.
For the analysis of LC-MS data, GenePattern provides support for the algorithms defined by PEPPeR, a Platform for Experimental Proteomic Pattern Recognition:
Landmark matching is a method to propagate identified peptides over time onto accurate mass LC-MS features in such a way as to maximize total identified peptides from disparate data acquisition methods. Using a combination of accurate mass and local retention time information it is possible to determine the likely identification of an unknown peak based on its relative location to known peaks.
Peak matching attempts to group similar features (or peaks) across multiple LC-MS sample runs by incorporating m/z and retention time (RT) variation. Although peak matching can be performed on virtually any type of LC-MS data, it is typically performed after landmark matching.
Supplemental materials for the PEPPeR modules include:
GenePattern's ProteoArray module provides the following support for the analysis of LC-MS data:
For a series of LC-MS experiments in mzXML format, GenePattern provides the ability to detect and align features across runs. This module is provided by Brian Piening of the Fred Hutchinson Cancer Research Center.
GenePattern provides the following support for the analysis of SELDI/MALDI data:
Quality assessment of the input spectrum as a function of the area under the spectrum and the area under the spectrum after removing the noise component of the signal.
Peak detection using digital convolution (moving window) filters, which applies smoothing, background correction, and peak enhancement filters to the spectrum before identifying final peak locations.
Spectra comparison, which filters the noise from two spectra and then compares the spectra using a cross correlation function.
A proteomics pipeline provides automated processing of SELDI/MALDI data. In addition to quality assessment and peak detection, the pipeline incorporates a range of normalization methods and sophisticated peak alignment algorithms for matching peaks across multiple samples. Starting with spectra from a set of samples, the pipeline outputs matched peaks as features, and normalized intensities of these peaks for each sample. Several aspects of the pipeline are fully customizable.
Integration with other GenePattern analysis modules. By representing peaks as features, the peak detection and proteomics pipeline modules create output files similar to those used as input for the modules that support gene expression analysis. Analyses such as clustering, classification, and differential marker selection are based on pattern recognition and applicable to the analysis of both proteomic data and gene expression data.
The modules for the analysis of SELDI/MALDI data are based on work published by Mani and Gillette in Proteomic Data Analysis: Pattern Recognition for Medical Diagnosis and Biomarker Discovery (Mehmed Kantardzic and Jozef Zurada (Eds.) Next Generation of Data Mining Applications, Wiley-IEEE Press).