You are here

Mol Cell Proteomics DOI:10.1074/mcp.M111.015768

Shotgun protein sequencing with meta-contig assembly.

Publication TypeJournal Article
Year of Publication2012
AuthorsGuthals, A, Clauser, KR, Bandeira, N
JournalMol Cell Proteomics
Date Published2012 Oct
KeywordsAlgorithms, Amino Acid Sequence, Animals, Armoracia, Cattle, Computational Biology, Escherichia coli, Horses, Humans, Mice, Molecular Sequence Data, Peptide Fragments, Proteins, Reproducibility of Results, Sensitivity and Specificity, Sequence Analysis, Protein, Tandem Mass Spectrometry

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.


Alternate JournalMol. Cell Proteomics
PubMed ID22798278
PubMed Central IDPMC3494147
Grant ListP41 GM103484 / GM / NIGMS NIH HHS / United States
1-P41-RR024851 / RR / NCRR NIH HHS / United States
U24 CA126476 / CA / NCI NIH HHS / United States
P41 RR024851 / RR / NCRR NIH HHS / United States
1U24 CA126476-02 / CA / NCI NIH HHS / United States