Spectral unmixing for next-generation mass spectrometry proteomics

Ryan Peckner
Proteomics Platform
Spectral unmixing for next-generation mass spectrometry proteomics

Abstract:  Mass spectrometry proteomics is the method of choice for large-scale quantitation of proteins in biological samples, allowing rapid measurement of the concentrations of thousands of proteins in various modified forms. However, this technique still faces fundamental challenges in terms of reproducibility, bias, and comprehensiveness of proteome coverage. Next-generation mass spectrometry, also known as data-independent acquisition, is a promising new approach with the potential to measure the proteome in a far more comprehensive and reproducible fashion than existing methods, but it has lacked a computational framework suited to the highly convoluted spectra it inherently produces. I will discuss Specter, an algorithm that employs linear unmixing to disambiguate the signals of individual proteins and peptides in next-generation mass spectra. In addition to describing the linear algebra underlying Specter, we'll discuss its implementation in Spark with Python, and see several real datasets to which it's been applied.

 

Karsten Krug
Proteomics Platform
Primer: Mass spectrometry-based proteomics

Abstract:  Mass spectrometry is the workhorse technology to study the abundance and composition of proteins, the key players in every living cell. Within the last decade the technology experienced a revolution in terms of novel instrumentation and optimized sample handling protocols resulting in ever growing numbers of proteins and post-translational modifications that can be routinely studied on a system-wide scale. Briefly, proteins are extracted from cells or tissues and fragmented into smaller peptides. This extremely complex peptide mixture is subjected to liquid chromatography separation and subsequent tandem mass spectrometry analysis in which mass-to-charge ratios of intact peptides and peptide fragments are recorded. Resulting mass spectra are matched to sequence databases or spectral libraries to read out the amino acid sequences and thereby identify the corresponding proteins.

The technology is fundamentally different from sequencing-based genomics technology and faces different problems, such as the tremendous dynamic range of protein expression. The instruments can be operated in different acquisition modes for different applications. I will briefly introduce the basics behind discovery or ‘shotgun’ proteomics, targeted proteomics, data dependent acquisition and data independent acquisition; the latter is a recent and promising development in the proteomics community but poses novel and only partly solved challenges in data analysis. Ryan Peckner will talk about Specter, an approach that tackles this problem using linear algebra.