Spotlight: Sampling the proteome, a spoonful at a time

With a focused approach, Broad scientists have taken a major step toward identifying all of the proteins that are present in mitochondria.
With a focused approach, Broad scientists have taken a major step toward identifying all of the proteins that are present in mitochondria
Photo by Maria Nemchuk

While proteins are the end products of our genes, they are not simply translations of raw genetic information. Proteins are dynamic and are almost always modified "off the genetic record," in a manner that is largely independent of their original DNA templates. And because a gene’s corresponding protein can assume several distinct forms, the full suite of proteins (known as the "proteome") far outnumbers its DNA counterpart. Thus, to understand proteins on a global scale — their amounts, their biochemical modifications, their locations in the cell, and how these features change under different physiological conditions or stresses, such as disease or drug treatment — the proteome itself must be queried, not just the genome. This proteomic view underpins several scientific efforts that are now underway in the Broad’s Proteomics Platform.

The wholesale availability of the genetic blueprints of many organisms, together with revolutionary technical advances, has shifted our focus from piecemeal studies of individual proteins to global analyses of the proteome. Recent advances, particularly in mass spectrometry (MS) and proteomic data analysis methods, have revolutionized the kinds of questions that we can address using proteomic approaches. With tandem MS (MS/MS), for example, we can pick a single protein fragment, or "peptide," among a mixture of thousands of other peptides, decipher its amino acid sequence, define functionally important chemical modifications that may be present, and then map the peptide to the full-length protein from which it originated.

Betty Chang
Betty Chang
Photo by Maria Nemchuk

To accomplish this formidable task, we first expose a soup of different proteins to an enzyme that cuts them at specific amino acids into smaller peptides. This tastier soup, owing to the much greater number of ingredients, is converted to electrically charged molecules ("ions") in the gaseous phase and streamed into the mass spectrometer. The instrument, under the direction of on-board computers, first measures and selects peptide ions based on their molecular mass and electrical charge, and then collides them with other inert atoms, causing the selected peptide ions to break into smaller bits inside the instrument. The profile that results reflects the masses of the intact peptide as well as the peptide’s constituent parts. Because proteins are written in a unique but limited biochemical language — a lexis of twenty different amino acids — the amino acid sequence of the peptide can be deduced from this data. Then, by matching this information against a database of full-length protein sequences, we can identify the protein present in the original mixture. After several iterations of the MS/MS method, we can take a biological sample of interest and, with little prior knowledge of its contents, identify most of the proteins it contains.

Together with Broad associate member Vamsi Mootha and his research colleagues, we applied this approach to systematically catalogue the proteins that function in mitochondria. These microscopic powerhouses supply eukaryotic cells with the energy needed for cellular function, and disturbances in mitochondrial activity can precipitate a host of human diseases. A thorough census of the mitochondria’s proteins, however, is no straightforward task. Though mitochondria contain their own DNA, this miniature genome supplies only a small fraction of the organelle’s proteins. The rest of its proteome hails from the nucleus, yet these imported proteins have been difficult for researchers to recognize based solely on molecular characteristics, such as DNA sequence. In fact, prior to our findings, which were recently described in Nature Genetics, only half of the ~1,500 proteins predicted to function in mitochondria had been identified.

Sarah Calvo devised an integrated approach to this problem, by combining eight genome-wide computational methods under a comprehensive program called "Maestro." Each method scans sequence and expression databases, including previously compiled MS/MS data, to map out the full complement of human mitochondrial proteins. With each of its eight "arms" working together, Maestro assigned nearly 800 additional proteins to mitochondria compared to previous proteomic studies. And about half of these predictions were novel, meaning the proteins had not been previously linked to mitochondria. With the proteomics efforts here at the Broad, we extended the catalogue of predicted mitochondrial proteins to well over 1000, but many of these newly predicted proteins required verification of their mitochondrial residence.

Tandem mass spectrometry enabled Broad researchers to confirm their predictions about the proteins in mitochondria (stained red in the cells at upper right).
Tandem mass spectrometry enabled Broad researchers to confirm their predictions about the proteins in mitochondria (stained red in the cells at upper right)Image designed by Maria Nemchuk

Our next step was to validate a subset of these new predictions, that is, to experimentally determine if these proteins are actually found in mitochondria. The task is akin to looking for a needle in a haystack, but novel mass spectrometry methods developed in the Proteomics group made this task possible. We began with a master list of ten new candidate proteins, as well as ten positive controls (proteins known to be in mitochondria) and ten negative controls (proteins found exclusively outside of mitochondria). Looking at this list of full-length proteins, we determined the peptide fragments that should be formed upon enzyme treatment. This told us, before even beginning our experiments, what the "needles" buried amidst all of the mitochondria's proteins should look like — their molecular masses and their amino acid sequences. With these focused search criteria, and taking advantage of the very high specificity and sensitivity of our MS systems, we could look through all of the mitochondria’s proteins using mass spectrometry, but analyze only those parts that appeared similar to the needles, or peptides, we were most interested in finding. We were able to verify all ten of the novel proteins that were predicted by Maestro to reside in mitochondria. A targeted approach such as ours enabled us to bore deeply into the mitochondrial proteome, sampling even the peptides and proteins that are in rare supply, which are often the most difficult to detect using MS/MS.

There are several other collaborations now underway in Proteomics that span the Broad’s many scientific programs and platforms. For example, we are working to identify and quantify both proteins and small molecules that appear in conjunction with human diseases, such as cancer or cardiovascular disease, in the hopes that they can serve as sensitive "biomarkers" for disease diagnosis or for monitoring treatment. In addition, there are ongoing efforts to catalogue the proteins present in other subcellular compartments, similar to our work with mitochondria. Another topic of active study is the profiling of partner proteins that go hand-in-hand with specific molecules, including proteins as well as DNA. We are also working to identify the proteins that are targeted by a variety of small molecules, which exert known biological effects but are otherwise uncharacterized. With these efforts, as well as others, we hope to shed light on the inner workings of the human genome’s often neglected, but nevertheless vital, cousin — the proteome.

Paper(s) cited

Calvo S, Jain M, Xie X, Sheth SA, Chang B, Goldberger OA, Spinazzola A,
Zeviani M, Carr SA, Mootha VK. Systematic identification of human
mitochondrial disease genes through integrative genomics
. Nature Genetics; doi:10.1038/ng1776