Phosphoproteomics is a branch of proteomics that identifies, catalogs, and characterizes proteins containing a phosphate group as a posttranslational modification. Phosphorylation is a key reversible modification that regulates protein function, subcellular localization, complex formation, degradation of proteins and therefore cell signaling networks. With all of these modification results, it is estimated that between 30–65% of all proteins may be phosphorylated, some multiple times.[1] [2] Based on statistical estimates from many datasets, 230,000, 156,000 and 40,000 phosphorylation sites should exist in human, mouse, and yeast, respectively.
Compared to expression analysis, phosphoproteomics provides two additional layers of information. First, it provides clues on what protein or pathway might be activated because a change in phosphorylation status almost always reflects a change in protein activity. Second, it indicates what proteins might be potential drug targets as exemplified by the kinase inhibitor Gleevec. While phosphoproteomics will greatly expand knowledge about the numbers and types of phosphoproteins, its greatest promise is the rapid analysis of entire phosphorylation based signalling networks.[3]
A sample large-scale phosphoproteomic analysis includes cultured cells undergo SILAC encoding; cells are stimulated with factor of interest (e.g. growth factor, hormone); stimulation can occur for various lengths of time for temporal analysis, cells are lysed and enzymatically digested, peptides are separated using ion exchange chromatography; phosphopeptides are enriched using phosphospecific antibodies, immobilized metal affinity chromatography or titanium dioxide (TiO2) chromatography; phosphopeptides are analyzed using mass spectrometry, and peptides are sequenced and analyzed.[4]
The analysis of the entire complement of phosphorylated proteins in a cell is certainly a feasible option. This is due to the optimization of enrichment protocols for phosphoproteins and phosphopeptides, better fractionation techniques using chromatography, and improvement of methods to selectively visualize phosphorylated residues using mass spectrometry. Although the current procedures for phosphoproteomic analysis are greatly improved, there is still sample loss and inconsistencies with regards to sample preparation, enrichment, and instrumentation. Bioinformatics tools and biological sequence databases are also necessary for high-throughput phosphoproteomic studies.[5]
Previous procedures to isolate phosphorylated proteins included radioactive labeling with 32P-labeled ATP followed by SDS polyacrylamide gel electrophoresis or thin layer chromatography. These traditional methods are inefficient because it is impossible to obtain large amounts of proteins required for phosphorylation analysis. Therefore, the current and simplest methods to enrich phosphoproteins are affinity purification using phosphospecific antibodies, immobilized metal affinity chromatography (IMAC), strong cation exchange (SCX) chromatography, or titanium dioxide chromatography. Antiphosphotyrosine antibodies have been proven very successful in purification, but fewer reports have been published using antibodies against phosphoserine- or phosphothreonine-containing proteins. IMAC enrichment is based on phosphate affinity for immobilized metal chelated to the resin. SCX separates phosphorylated from non-phosphorylated peptides based on the negatively charged phosphate group. Titanium dioxide chromatography is a newer technique that requires significantly less column preparation time. Many phosphoproteomic studies use a combination of these enrichment strategies to obtain the purest sample possible.
Mass spectrometry is currently the best method to adequately compare pairs of protein samples. The two main procedures to perform this task are using isotope-coded affinity tags (ICAT) and stable isotopic amino acids in cell culture (SILAC). In the ICAT procedure samples are labeled individually after isolation with mass-coded reagents that modify cysteine residues. In SILAC, cells are cultured separately in the presence of different isotopically labeled amino acids for several cell divisions allowing cellular proteins to incorporate the label. Mass spectrometry is subsequently used to identify phosphoserine, phosphothreonine, and phosphotyrosine-containing peptides.[6]
Intracellular signal transduction is primarily mediated by the reversible phosphorylation of various signalling molecules by enzymes dubbed kinases. Kinases transfer phosphate groups from ATP to specific serine, threonine or tyrosine residues of target molecules. The resultant phosphorylated protein may have altered activity level, subcellular localization or tertiary structure.
Phosphoproteomic analyses are ideal for the study of the dynamics of signalling networks. In one study design, cells are exposed to SILAC labelling and then stimulated by a specific growth factor. The cells are collected at various timepoints, and the lysates are combined for analysis by tandem MS. This allows experimenters to track the phosphorylation state of many phosphoproteins in the cell over time. The ability to measure the global phosphorylation state of many proteins at various time points makes this approach much more powerful than traditional biochemical methods for analyzing signalling network behavior.[7]
One study was able to simultaneously measure the fold-change in phosphorylation state of 127 proteins between unstimulated and EphrinB1-stimulated cells.[8] Of these 127 proteins, 40 showed increased phosphorylation with stimulation by EphrinB1. The researchers were able to use this information in combination with previously published data to construct a signal transduction network for the proteins downstream of the EphB2 receptor.
Another recent phosphoproteomic study included large-scale identification and quantification of phosphorylation events triggered by the anti-diuretic hormone vasopressin in kidney collecting duct.[9] A total of 714 phosphorylation sites on 223 unique phosphoproteins were identified, including three novel phosphorylation sites in the vasopressin-sensitive water channel aquaporin-2 (AQP2).
Since the inception of phosphoproteomics, cancer research has focused on changes to the phosphoproteome during tumor development. Phosphoproteins could be cancer markers useful to cancer diagnostics and therapeutics. In fact, research has shown that there are distinct phosphotyrosine proteomes of breast and liver tumors. There is also evidence of hyperphosphorylation at tyrosine residues in breast tumors but not in normal tissues. Findings like these suggest that it is possible to mine the tumor phosphoproteome for potential biomarkers.
Increasing amounts of data are available suggesting that distinctive phosphoproteins exist in various tumors and that phosphorylation profiling could be used to fingerprint cancers from different origins. In addition, systematic cataloguing of tumor-specific phosphoproteins in individual patients could reveal multiple causative players during cancer formation. By correlating this experimental data to clinical data such as drug response and disease outcome, potential cancer markers could be identified for diagnosis, prognosis, prediction of drug response, and potential drug targets.
While phosphoproteomics has greatly expanded knowledge about the numbers and types of phosphoproteins, along with their role in signaling networks, there are still several limitations to these techniques. To begin with, isolation methods such as anti-phosphotyrosine antibodies do not distinguish between isolating tyrosine-phosphorylated proteins and proteins associated with tyrosine-phosphorylated proteins. Therefore, even though phosphorylation dependent protein-protein interactions are very important, it is important to remember that a protein detected by this method is not necessarily a direct substrate of any tyrosine kinase. Only by digesting the samples before immunoprecipitation can isolation of only phosphoproteins and temporal profiles of individual phosphorylation sites be produced. Another limitation is that some relevant proteins will likely be missed since no extraction condition is all encompassing. It is possible that proteins with low stoichiometry of phosphorylation, in very low abundance, or phosphorylated as a target for rapid degradation will be lost.[10] Bioinformatics analyses of low-throughput phosphorylation data together with high-throughput phosphoproteomics data (based mostly on MS/MS) estimate that current high-throughput protocols, after several repetitions are capable of capturing 70% to 95% of total phosphoproteins, but only 40% to 60% of total phosphorylation sites.