Research ArticleProteomics

Reproducible Quantification of Cancer-Associated Proteins in Body Fluids Using Targeted Proteomics

See allHide authors and affiliations

Science Translational Medicine  11 Jul 2012:
Vol. 4, Issue 142, pp. 142ra94
DOI: 10.1126/scitranslmed.3003989

Abstract

The rigorous testing of hypotheses on suitable sample cohorts is a major limitation in translational research. This is particularly the case for the validation of protein biomarkers; the lack of accurate, reproducible, and sensitive assays for most proteins has precluded the systematic assessment of hundreds of potential marker proteins described in the literature. Here, we describe a high-throughput method for the development and refinement of selected reaction monitoring (SRM) assays for human proteins. The method was applied to generate such assays for more than 1000 cancer-associated proteins, which are functionally related to candidate cancer driver mutations. We used the assays to determine the detectability of the target proteins in two clinically relevant samples: plasma and urine. One hundred eighty-two proteins were detected in depleted plasma, spanning five orders of magnitude in abundance and reaching below a concentration of 10 ng/ml. The narrower concentration range of proteins in urine allowed the detection of 408 proteins. Moreover, we demonstrate that these SRM assays allow reproducible quantification by monitoring 34 biomarker candidates across 83 patient plasma samples. Through public access to the entire assay library, researchers will be able to target their cancer-associated proteins of interest in any sample type using the detectability information in plasma and urine as a guide. The generated expandable reference map of SRM assays for cancer-associated proteins will be a valuable resource for accelerating and planning biomarker verification studies.

Introduction

The identification of validated disease biomarkers for diverse clinical needs such as prognosis, diagnosis, and patient stratification to follow and guide therapies is a predominant line of inquiry in translational research (1, 2). To establish the value of a protein biomarker, it is imperative to reliably identify and reproducibly quantify the protein of interest over multiple samples (3). Because of advances in proteomic and genomic technologies, long lists of biomarker candidates have been generated that contain proteins hypothesized to change in abundance relative to specific diseases or disease states. However, subsequent hypothesis testing in large cohorts of patient specimens needs to be performed to verify the clinical use of such biomarker candidates (4).

The preferred specimens for biomarker testing are easily sampled body fluids like plasma and urine (2, 46). However, the highly complex proteomes of plasma and urine pose technical challenges for analysis (7, 8). This applies particularly to the verification of biomarker candidates, a process that requires the accurate, sensitive, and reproducible quantification of multiple proteins in complex backgrounds over large cohorts of patients’ specimens (9).

Traditionally, such hypothesis testing has been accomplished using affinity-based assays, such as enzyme-linked immunosorbent assays (ELISAs). Major constraints of this approach are the limited availability of validated ELISAs for most human proteins, the expensive and time-consuming development of de novo assays, and the difficulty of assay multiplexing (10). These limitations preclude the timely verification of the rapidly increasing number of candidate proteins that are derived from high-throughput proteomic and genomic screens (11). Additionally, computational approaches are emerging that use data integration and network inference to generate sets of biomarker candidates (1217) that further increase the number of candidate proteins that need to be experimentally verified. Therefore, it is anticipated that the rate of hypothesis generation for biomarker research will further increase, and consequently, a well-matched analytical platform that allows rapid and high-throughput hypothesis testing is needed.

Targeted mass spectrometry (MS) through selected reaction monitoring (SRM) has emerged as an alternative to affinity-based measurements of defined protein sets (5, 10, 1822). The main advantage of SRM is the capacity for faster and cost-efficient assay development (23). SRM has also the feature of being able to quantify multiple proteins in parallel (multiplexing) at a low limit of detection (LOD) and high accuracy. Additionally, it has been shown that protein quantification by SRM in complex samples using predefined assay coordinates is reproducible across different laboratories and instrument platforms (24). Consequently, SRM-based hypothesis testing is ideally matched with high-throughput hypothesis generation and has the potential to bridge the gap between generating lists of candidates and evaluating their clinical use (9).

There are two major challenges facing the implementation of SRM in a biomarker verification pipeline. The first is the generation of high-quality SRM assays for sets of biomarker candidates, which can include several hundred proteins. Picotti et al. developed a high-throughput method for SRM assay development that is based on the use of crude synthetic peptide libraries as a reference for validating SRM assays (23). Once such an assay has been developed, it becomes universally applicable. Therefore, publicly accessible repositories have been generated that contain SRM assays (25). The second challenge is the reliable identification and quantification of several hundred proteins in the sample of interest, usually complex body fluids (10, 20). Recently, two data analysis tools, mProphet (26) and SRMstats (27), have been developed to assist in the identification and quantification of peptides and proteins measured by SRM. mProphet is an automated tool that allows the objective and reliable association of groups of SRM signals with their corresponding peptide sequences in complex samples, with and without isotope-labeled internal standards (26). SRMstats is a statistical modeling framework for protein significance analysis based on linear mixed-effects models specifically adapted for the SRM data structure (27). Therefore, major bottlenecks impeding the wide use of SRM technology have recently been alleviated.

Here, we generated a resource of SRM assays for more than 1000 proposed protein biomarker candidates that have been previously associated with cancer. Using a functional protein interaction network, we demonstrated that these proteins are enriched among the interaction partners of genes mutated in cancer. The SRM assays developed through high-throughput peptide synthesis and MS were subsequently applied to determine the detectability of the targeted peptides in clinically relevant samples. Furthermore, we demonstrated the applicability of the assays to reproducibly and accurately quantify biomarker candidates across a large number of patient specimens. This resource of definitive SRM assays for cancer-associated proteins (CAPs) enables and accelerates clinical SRM-based biomarker verification studies.

Results

A comprehensive list of CAPs

Our goal was to select a comprehensive list of proteins that have been previously implicated in cancer. Polanski and Anderson compiled an evidence-based list of 1261 proteins that are differentially expressed in cancer (CAPs) (28). To compile this list, the authors found reported abundance changes observed at the protein level in human plasma or tissue. They also accepted reported cases with changes at the DNA (ploidy changes) or RNA level in affected tissue. Furthermore, the selection was derived from various sample types and technology platforms. Of the 1261 proteins, we selected 1130 that could be unambiguously associated with UniProt identifiers (http://www.uniprot.org/). We added U.S. Food and Drug Administration (FDA)–approved protein markers if they were not already included in the target list of CAPs (8). In total, the list of CAPs selected for SRM assay development consisted of 1172 proteins (table S1).

Relationship of CAPs to candidate cancer driver mutations

It has been demonstrated that cancer origin and progression is driven not by single gene mutations or expression changes but by coordinated changes in variable subsets of genes (29). Although diverse genomic alterations are observed in different individuals with tumors of the same clinical type, sets of mutated genes can function in the same signaling pathway or subnetwork, leading to similar or identical phenotypes (12). Proteins that are functionally related to the mutated genes in a subnetwork or pathway can potentially be used as a biomarker for the state of pathways or subnetworks perturbed by gene mutations. Therefore, we investigated the functional relationship of CAPs to candidate cancer driver mutations (CDMs), which have been discovered by unbiased whole-exome sequencing of multiple human cancers. We compiled a list of 379 CDMs from the whole-exome sequencing data of seven different cancer types (3035). Their subnetwork in the Reactome Functional Protein Interaction Network (RFIN) (36) was examined for the presence of CAPs. Further, we investigated the enrichment of CAPs among the functional interaction partners of CDMs. Forty-three of the 1172 CAPs have also been discovered as CDMs by unbiased whole-exome sequencing (Table 1). Including the functional interaction partners of CDMs, we obtained 608 CAPs that have a direct functional relationship to CDMs (Table 1). We determined that the CAPs are significantly enriched in the subnetwork of CDMs compared to random networks of the same size and degree distribution (P = 4.3 × 10−11) (Table 1) (37). These results demonstrate that the selected target proteins are not only interesting as potential cancer biomarkers but also relevant for studying perturbed protein networks by genetic mutations that drive cancer development.

Table 1

Intersection of the candidate CDMs and their interaction partners in the RFIN with the CAPs. It is assumed that a CDM can be monitored by measuring the protein encoded by the CDM or in the subnetwork of CDMs.

View this table:

Selection of representative peptides for CAPs

The selection of peptides that uniquely represent the target proteins in the proteome [proteotypic peptides (PTPs)] is an important step in the development of SRM assays (38, 39). For each protein, we aimed for five tryptic peptides with favorable MS properties and unique occurrence within the human proteome. We primarily used empirical evidence of MS detectability for PTP selection. Therefore, we prioritized peptides for each target protein that have been previously detected in large-scale shotgun MS data collections, namely, the Human Peptide Atlas (PA), Human Plasma PA (40, 41), and an extensive data set produced by in-depth fractionation and MS sequencing of human cell lysates (42). If no or insufficient empirical evidence was available, we computationally predicted PTPs using criteria that have been shown to favor detection by MS (39, 43). In total, 5426 peptides were selected to represent the 1172 CAPs (table S2). Of 5426 peptides, 2948 (54%) have been previously observed in shotgun proteomic data sets, and 2478 peptides (46%) were predicted (fig. S1). Most of the protein targets, numbering 1002, were covered by five peptides. Ninety-three percent of the selected peptides were unique to their respective protein, indicating a high likelihood of developing a specific assay for most of the proteins.

Development of SRM assays for CAPs

We developed SRM assays intended for the identification and quantification of CAPs in various sample types and protein backgrounds. Such assays consist of the mass, charge state distribution, and chromatographic retention time of the precursor ion as well as the mass, charge state distribution, and relative intensity of the fragment ion signals (25). To establish the assays, we used a multistep process. The first step was the acquisition of full fragment ion spectra for the target peptides using chemically synthesized peptide libraries as described by Picotti et al. (23) (Fig. 1A). Overall, 6787 fragment spectra were confidently assigned to 4821 of the 5426 synthesized peptides (89%) and used to extract the SRM assay coordinates. In the second step, SRM assay coordinates were refined by determining the precise relative fragment ion signal intensities and the indexed retention time (iRT) (44) in SRM acquisition mode (Fig. 1B), thereby providing important information for scheduling SRM measurements as well as for scoring the SRM data (26). Finally, SRM assays showing a low quality in SRM mode were eliminated by manual inspection. The overall success rate of assay generation at the peptide level was 74% (3996 peptides) and 99% (1157 proteins) at the protein level. For most of the CAPs (80%), refined assays are available to target at least three peptides per protein (Fig. 2). Each refined SRM assay was defined by the iRT of the peptide and the relative intensity ratios of its five most intense transitions (table S3). Thus, the generated SRM assays constitute a high-quality map for CAPs that can be applied to any biological sample of interest.

Fig. 1

Workflow outlining SRM assay generation, refinement, and application to detect target proteins in plasma and urine. (A) In the first step, a crude synthetic peptide library was used to generate QQQ full-fragment ion spectra for the extraction of the preliminary coordinates for SRM assays. (B) In the second step, SRM assays were refined by measuring the crude synthetic peptides in SRM mode using the coordinates established in full-scan mode. This step refined the relative transition intensities specific for the SRM acquisition mode and the iRT in the chromatographic gradient to be used for endogenous peptide detection. (C) The final SRM assay library was then used to detect the CAPs in complex samples (depleted plasma and urine). Decoy transition groups and positive controls were included in the SRM measurements to allow for objective data analysis using the mProphet software tool (26).

Fig. 2

Number of peptides per protein in the SRM assay library for CAPs.

Detection of CAPs in depleted plasma and urine using SRM

To generate a resource for accelerating the verification of CAPs as potential biomarkers, we determined the detectability of the target proteins in widely used clinical samples using the SRM coordinates determined above. We applied the generated SRM assays to human urine and depleted plasma using label-free SRM (without addition of internal standards) (Fig. 1C and Supplementary Result 1). Such measurements in complex samples record numerous interfering transition signals, and it is challenging to distinguish between true and false assignments by manual inspection of the data (45, 46). To minimize the number of false-positive peptide identifications, we evaluated the resulting data using the mProphet software tool, which was modified compared to the original publication to account for the iRT deviation as an additional score in the combined scoring function (26, 44).

Plasma results

In the depleted plasma sample, 302 peptides corresponding to 182 proteins were detected with a false discovery rate (FDR) of 2% and a sensitivity of 70% calculated by mProphet at the level of the SRM assay (fig. S2, A and B, and table S4). We investigated the concentration range of the detected proteins using estimated concentrations based on spectral counts derived from Human Plasma PA (41) (table S1). The detected proteins span five orders of magnitude in plasma, reaching an LOD below 10 ng/ml (Fig. 3A). The distribution of detected CAPs over the concentration range confirms the limited detectability by MS for low-abundance proteins in plasma. However, using SRM, we made 83 previously unreported protein observations compared to the 187 CAPs previously observed in the Human Plasma PA by data-dependent analysis in crude or depleted plasma (Fig. 3B). In contrast, 88 of 187 CAPs listed in the Human Plasma PA were not detected by SRM, of which 19 have an estimated concentration above 100 ng/ml. To study the effect of depletion on the detectability of CAPs in plasma, we applied the SRM assays to a crude plasma digest. Only 73 proteins were detected in the crude digest, and as expected, these proteins were mostly high-abundance plasma proteins (fig. S3 and table S5). The abundance of the 73 proteins ranged from 1.6 mg/ml to 35 ng/ml (fig. S3). These results demonstrate the increased sensitivity of detecting low-abundance proteins in plasma using a simple sample preparation such as depletion of the high-abundance proteins. However, for detecting a higher number of proteins in the low nanogram per milliliter concentration range, additional sample preparation steps are needed to decrease sample complexity. The SRM assays developed in this study are equally applicable to fractionated or enriched plasma samples.

Fig. 3

Detectability results for depleted plasma and urine. (A) The plotted concentration range shows detected CAPs (blue) and CAPs that could not be detected (gray) in depleted plasma. Estimated protein concentrations for the CAPs in plasma were extracted from Human Plasma PA (41). (B) Proteins detected by SRM were compared to proteins previously observed by large-scale proteomic experiments derived from Human Plasma PA (including measurements in unfractionated, crude, and depleted plasma). (C) The plotted concentration range shows detected CAPs (blue) and CAPs that could not be detected (gray) in urine. Estimated protein concentrations for the CAPs in urine were extracted from Urine PA (41). (D) Proteins detected by SRM were compared to proteins previously observed by large-scale proteomic experiments derived from Urine PA combined with protein observations from Adachi et al. (49).

To further investigate the functional characteristics of the detectable CAPs in depleted plasma compared to all targeted CAPs, we conducted an enrichment analysis using the functional annotation tool Database for Annotation, Visualization and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf.gov/) (47, 48). The CAPs detectable in depleted plasma are enriched for extracellular region proteins (117 proteins found; P = 4.7 × 10−11). They are mainly involved in acute inflammatory response (29; P = 3.1 × 10−10), complement activation (16; P = 7.1 × 10−8), and response to wounding (59; P = 7.5 × 10−7). Most of the detected CAPs are annotated in UniProt (http://www.uniprot.org/) as either plasma proteins (69; P = 8.7 × 10−34) or proteins highly expressed in the liver (86; P = 9.1 × 10−13) and are thus among high-abundance proteins in plasma.

Urine results

In urine, we detected 661 peptides corresponding to 408 proteins with an FDR of 3% and a sensitivity of 70% (fig. S2, C and D, and table S6). Different FDR cutoffs were chosen to report detectable proteins for depleted plasma and urine on the basis of a consistent sensitivity for both sample types. We also investigated the detected concentration range in urine by extracting estimated concentrations from the Urine PA (table S1). The narrower concentration range of proteins in urine allowed us to detect a larger number of CAPs in the low nanogram per milliliter range (Fig. 3C). For many of the CAPs detected in urine, no estimated concentrations are available (fig. S3). Nevertheless, we expect that their concentration is in the subnanogram per milliliter range, which would translate into a dynamic concentration range of detected proteins similar to that of plasma. It is also expected that the distribution of detected proteins thins out toward lower picogram per milliliter concentrations, reflecting the trend observed in plasma. Compared to proteomic data sets contained in Human Urine PA and the data set of Adachi et al. (49), we detected 169 previously undetected proteins using our SRM assays (Fig. 3D). Similar to depleted plasma, the detectable CAPs in urine are enriched for extracellular region proteins (189 proteins found; P = 5.4 × 10−5). However, in comparison to depleted plasma, a larger number of plasma membrane (137 versus 65 proteins) and cytosolic proteins (67 versus 16) could be detected in urine. These proteins could be derived from cells that are shed into urine. Additionally, it has been suggested that plasma membrane proteins are excreted in urine through exosome formation (49). Furthermore, the narrower concentration range of proteins in urine allows the detection of proteins that are usually masked in plasma by the highly abundant classical plasma proteins.

Characteristics of CAPs detected in body fluids

The final list of 473 CAPs detected in urine and depleted plasma demonstrates the power of SRM in targeting proteins over a large dynamic range in minimally processed complex body fluids. Next, for all detectable peptides, we determined the theoretical specificity of the SRM assays. This analysis was based on the uniqueness of the transition mass/charge ratios (m/z) that define the assay in a background of all human peptides identified in the Human PA, assuming that this represents the MS-detectable fraction of the human proteome. We used SRMCollider (50), a tool based on the unique ion signature approach described by Sherman et al. (51). The analysis revealed high theoretical assay specificity for the detected peptides in urine and plasma in the background of the Human PA. Ninety-four percent of the peptides were monitored by a unique combination of transitions (Supplementary Result 2 and fig. S4).

Furthermore, we investigated how many of the detectable CAPs in urine and plasma are functionally related to CDMs. Of the 43 CDMs that are also reported as CAPs, 19 can be directly monitored in at least one of the body fluids (Table 1). Two hundred thirty-two detectable CAPs represent functional interaction partners of the CDMs (Table 1). Assuming that the status of the candidate cancer driver can be deduced either by the direct measurement of the protein encoded in the CDM or by a functionally related protein, the detectable CAPs allow monitoring the status of 143 CDMs in either urine or plasma (Table 1). For all the source cancer types that we used to derive the CDMs, we generated individual subnetworks consisting of the CDMs identified in the respective cancer types and the functionally interacting CAPs (figs. S5 to S11), thereby demonstrating that they are highly interconnected. Figure 4 shows the subnetwork for pancreatic cancer, highlighting the detectable CAPs in urine and plasma. Thirty of the 39 CDMs (77%) that were discovered for pancreatic cancer by unbiased whole-exome sequencing are connected to detectable CAPs. Although well-studied CDMs such as TP53, KRAS, and SMAD4 have many connected CAPs that are also detectable in one of the body fluids, other CDMs are poorly connected in the subnetwork or not even part of the subnetwork. We propose that the subnetwork of these CDMs should be studied as new potential cancer protein biomarkers.

Fig. 4

Functional protein interaction network for pancreatic cancer. The diagram depicts functional interactions of the identified candidate CDMs for pancreatic cancer and the detectable CAPs. Nodes represent CAPs (circles), CDMs (squares), and CDMs that are also reported cancer-associated on protein level (triangles). Colors denote the detectability of the proteins in plasma or urine: blue, detectable; pink, not detectable; gray, not targeted in plasma or urine. Functional interactions between the proteins are marked as edges. The figure was generated using Cytoscape (68).

Monitoring biomarkers by SRM in a large patient cohort

Finally, we aimed to demonstrate the capacity of the SRM assays developed in this study to monitor known biomarkers in a larger cohort of patient specimens. In our initial list of CAPs, we included proteins for which FDA-approved assays exist, for example, those in the OVA1 biomarker panel. This panel assesses the ovarian cancer (OC) risk in women diagnosed with an ovarian tumor before planned surgery (52, 53). The OVA1 panel analyzes five protein biomarkers—cancer antigen 125 (CA125), β2-microglobulin (B2MG), apolipoprotein A1 (APOA1), transthyretin (TTHY), and transferrin (TRFE)—using antibody-based assays and combines the results of each test to classify patients into high or low risk for ovarian malignancy. In the detectability test, we demonstrated that B2MG, APOA1, TTHY, and TRFE are accessible by SRM in an unfractionated tryptic digest of the plasma proteome. We chose a cohort of plasma samples derived from OC patients (n = 67) and patients with benign ovarian tumors (BOTs) (n = 16) to confirm that the SRM assays quantify the proteins reproducibly across the patient plasma samples and detect abundance differences between the two patient groups. Plasma samples derived from healthy individuals were not included because the OVA1 panel specifically detects OC in women already diagnosed with a pelvic mass. To explore the capacity of SRM to multiplex protein measurements, we added 30 target proteins to the OVA1 panel. These proteins were selected either because they were proposed as biomarker candidates for OC before or because we predicted them to be functionally related to mutated or epigenetically silenced genes in OC (54) (Table 2). In total, we monitored 34 proteins (62 peptides) across plasma derived from 67 OC patients and 16 patients with BOTs. mProphet (26) identified the peptides and proteins that were confidently detected across the samples, and SRMstats (27) was used for protein significance analysis.

Table 2

Quantification of selected proteins measured by SRM in plasma of OC patients and patients with BOTs. Statistical analysis was performed using a linear mixed-effects model implemented in SRMstats (27). The proteins were selected because they either are part of the OVA1 biomarker panel (OVA1), have been proposed as biomarker candidates for OC (literature), or functionally interact with epigenetically silenced or mutated genes in OC (network).

View this table:

Of the consistently quantified proteins, 19 showed a significant fold change comparing plasma samples from OC patients and patients with BOTs (Fig. 5 and Table 2). The protein significance analysis confirmed a significant abundance difference for the proteins of the FDA-approved OVA1 panel—APOA1, TRFE, B2MG, and TTHY—for the two patient groups. Furthermore, the direction of abundance change for these proteins was consistent with the results described previously that were generated by immunoassays (55, 56). Of the other 15 proteins with significant abundance change, 3 have been previously suggested as biomarker candidates for OC in the literature, 9 proteins were derived from the network analysis of mutated and epigenetically silenced genes in OC, and 3 proteins were selected on the basis of both literature and network analysis (Table 2). These results demonstrate that SRM allows the reproducible and accurate quantification of proteins across larger patient cohorts and confirm the capacity of SRM to complement and extend antibody-based assays for expedient verification of biomarker candidates. Furthermore, the significant abundance difference of the proteins predicted by network analysis suggests that this approach could be explored for biomarker discovery.

Fig. 5

Quantification of selected proteins in plasma of OC patients and patients with BOTs. All proteins with a P value below 0.01 and a fold change (FC) below 0.9 or above 1.1 were considered significant.

Discussion

Over the last several years, long lists of proteins have been proposed as potential biomarkers for various cancer types without further evaluation of their clinical use. The lack of follow-up is due, to a large extent, to the lack of a technology for the expedient, reproducible, and accurate verification of the proteins as biomarkers. Recent developments in SRM-based targeted proteomics show promise for accelerating the hypothesis testing of multiple biomarker candidates in large cohorts of patient specimens. The aim of this study was the generation of a resource of high-quality SRM assays for the detection and quantification of cancer-associated proteins to assist and accelerate the verification of cancer biomarker candidates in clinical specimens.

We developed definitive SRM assays for 1157 proteins, which have been previously reported to change abundance in various human cancers and which were found to be functionally linked with genetic mutations driving cancer development. Of the 1157 CAPs for which we generated assays, we detected 182 proteins in depleted plasma and 408 in urine using a label-free SRM strategy. The data sets have been submitted to the PeptideAtlas SRM Experiment Library (PASSEL, http://www.peptideatlas.org/passel/) (57), which allows researchers to extract the SRM assay coordinates and to check detectability information for proteins of interest (http://www.peptideatlas.org/PASS/PASS00004 for depleted plasma, http://www.peptideatlas.org/PASS/PASS00006 for crude plasma, http://www.peptideatlas.org/PASS/PASS00007 for urine, and http://www.peptideatlas.org/PASS/PASS00041 for the quantification of proteins in plasma of OC patients and patients with BOTs). We demonstrated the use of this library to accurately and reproducibly quantify 34 biomarker candidates across a larger cohort of patient plasma samples. The quantified proteins included APOA1, TRFE, B2MG, and TTHY, which are all part of the FDA-approved OVA1 biomarker panel and for which expected abundance changes were confirmed comparing OC patients and patients with BOTs. Furthermore, a subset of the proteins tested that were not included in the OVA1 panel also achieved highly significant separation of OC and BOT plasma samples, thus raising the possibility that the performance of the OVA1 test could be further improved by the inclusion of additional proteins.

The SRM assays for the CAPs have been generated in a multistep process, in which the SRM coordinates were first extracted from fragment spectra of synthetic peptides and subsequently refined by measuring the synthetic peptides in SRM acquisition mode. Although sample-specific interferences may compromise some transitions, we demonstrated the quality of the refined assay library by applying it to two of the most complex proteomes commonly analyzed. Moreover, we calculated a high theoretical specificity for the detected targets in plasma and urine by simulating the instances of interfering transitions assuming a complex proteomic background. This shows that the SRM assays enable researchers to directly target and consistently detect CAPs in any sample type and to thus test their potential as biomarkers. However, the resource does not provide LODs and LOQs (limits of quantification) for the SRM assays. These properties cannot be defined generally because they are dependent on the sample preparation and the instrument platform. They should be determined locally and preferentially using isotope-labeled internal standards before the verification of the proteins in a large cohort of clinical samples.

The initial list of CAPs was assembled from studies of different sample types and various technologies applied at the protein and nucleic acid level (28). Only a subset of the proteins has been observed by MS, and an even smaller subset has been detected previously in plasma and urine, usually with low reproducibility in extensively fractionated samples. Therefore, it was anticipated that many of the proteins would not be detectable in plasma or urine by SRM. Nevertheless, in comparison to data extracted from the large-scale shotgun MS data sets in Human Plasma PA, Urine PA, and the urine study of Adachi et al. (49), we obtained a high number of novel observations using our targeted proteomics approach, 83 and 169 CAPs for plasma and urine, respectively. However, 88 and 103 CAPs previously detected by shotgun MS in plasma and urine, respectively, were not detected with our method, likely due to the use of alternative sample preparation strategies, different types of MS instruments, or because their abundance may be lower in the samples used in this study. The proteome complexity of plasma and urine is the major limitation for the detectability of target proteins. We demonstrate that the depletion of the 14 highest abundance plasma proteins increases the number of detected CAPs in plasma from 73 to 182, especially increasing the detectability of proteins in the nanogram per milliliter concentration range. The results obtained are similar to previously reported studies combining depletion of high-abundance plasma proteins and shotgun proteomics (58, 59). Although some high-abundance proteins have proven to be clinically useful, such as those in the OVA1 panel, cancer biomarker studies should still reach the low nanogram per milliliter concentration range in plasma routinely (4, 11) because tissue-derived proteins are expected and many current clinically used biomarkers are located in this concentration range (2). The combination of SRM and depletion of the highest abundance proteins in plasma achieves the required sensitivity only for a subset of the CAPs.

It has been previously shown that the sensitivity for detecting low-abundance proteins in body fluids can be further improved using other sample preparation regimens that reduce complexity. These include the selective isolation of N-glycosylated peptides (22, 60), fractionation by strong cation exchange chromatography (61), peptide enrichment using Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA) (10, 62), and the enrichment of low–molecular weight and low-abundance proteins using nanoparticles (63, 64). The main disadvantage of fractionation is the reduction in throughput because the number of samples is multiplied by the number of fractions made. The enrichment of low–molecular weight proteins has a high potential to detect peptidomic-sized fragments shed or secreted from tissue, but it also neglects a large part of the proteome. Additional costs for achieving higher sensitivity with enrichment strategies include increasing technical variability with additional handling steps, a dependency on suitable affinity reagents, focusing on a subset of proteins, and limited multiplexing capabilities. Therefore, in this study, we used the depletion of highly abundant plasma proteins, a simple sample preparation for detectability testing that does not adversely affect throughput and is commonly used for plasma proteomics. However, the disadvantage of the depletion strategy is the removal of additional proteins that are noncovalently associated with the depleted peptides or proteins (such as albumin) and thus potentially disturb the observed protein patterns of the target proteins (65). Furthermore, plasma depletion reaches the required sensitivity for biomarker studies only for a subset of proteins. However, the SRM assays of this resource are not limited to one sample preparation strategy but can be combined with all the above-mentioned methods, except N-glycopeptide enrichment, to lower sample complexity and gain sensitivity in detecting low-abundance proteins. Additionally, the resource can be easily expanded to include detectability information of CAPs for the other sample preparations.

We demonstrated that the CAPs, even though they were compiled from various sources, are highly enriched in the subnetworks of CDMs discovered by unbiased whole-exome sequencing. This not only shows the potential clinical importance of the compiled CAPs but also suggests a systematic approach for discovering new biomarker candidates. Current efforts in unbiased whole-exome or full genomic sequencing give more insights into the molecular development of different cancer types. Additionally, a wealth of data is publicly available from transcriptomic and proteomic screens. Integrating the available large-scale data sets with rapidly growing protein-protein or functional protein interaction networks for humans can potentially lead to the identification of pathways and subnetworks that are perturbed in specific cancers. Proteins that are part of the identified pathways or subnetworks can then be tested as potential biomarkers. Such network-based biomarker candidate prediction is supported by the results that we obtained in this study. We predicted biomarker candidates combining genes mutated in OC and a functional interaction network and, using our SRM assay library, could confirm significant abundance changes for 12 of the proteins comparing plasma samples derived from OC patients and patients with BOTs. Such computational network approaches reduce time and resources spent on generating new potential biomarkers lists so that more efforts can be invested in the verification of their clinical use.

The experimental procedures used in previous studies to generate lists of biomarker candidates were time- and resource-consuming, and most of the studies concluded after the discovery of CAPs without verifying their clinical use. Recent developments in SRM showed that it is a valuable alternative technology for protein quantification compared to the “gold standard” ELISA. High-throughput, cost-efficient, and fast SRM assay development as well as multiplexed analyte measurements allow for faster verification of the CAPs in complex samples. To translate the research findings into clinical practice, research efforts need to concentrate on hypothesis testing in large clinical cohorts because it is anticipated that only a few of the proposed markers will have an impact in the current clinical practice. So far, it has been shown that most biomedical research has focused on only a few well-studied proteins precisely because the costly analytical tools for their study are already available (66). In contrast, the resource described here has the capacity to facilitate exploratory biomedical research and to drive it toward unstudied parts of the human proteome. For example, on top of the assays for more than 1000 proteins contained in this library, we estimate that the expansion of our SRM resource by an additional 100 proteins could be accomplished in less than 1 month. Therefore, we envision this SRM resource as a tool well matched with computational candidate prediction.

The SRM reference map generated here provides high-quality SRM assays, enabling direct measurement of the proteins associated with cancer and functionally related to genes frequently mutated in cancer in any sample type. The detectability information indicates where proteins are accessible and thus guides the selection of targets and experimental procedures involved in biomarker verification. Using the outlined workflow, the resource of assays can be rapidly expanded to include more proteins and sample types. A strategy based on this assay library in combination with the software tools mProphet and SRMstats for the analysis of large-scale SRM data sets provides the foundation for clinical SRM-based cancer biomarker verification studies. Once candidate cancer biomarker verification is accelerated, validation of clinical use should follow.

Materials and Methods

Protein selection

One thousand one hundred seventy-two protein targets were selected on the basis of two previously published lists of proteins: one enumerating those associated with cancer (28) and the other containing protein analytes that have FDA-approved clinical assays (8). Proteins were identified by their unique UniProt accession number (table S1). If a protein did not have a UniProt accession number given in the source list, the protein’s name was searched in the UniProt database (http://www.uniprot.org/) or the protein’s gene symbol was searched in the Gene/Protein Synonyms finder (http://expasy.org/cgi-bin/gpsdb/form) to assign a UniProt identifier.

Network analysis

Functional relationships between the CAPs and the CDMs were investigated with the RFIN (16, 67). CDMs were obtained from whole-exome sequencing and resequencing studies of cancer genomes for seven human tissue types (3035). CDMs were projected on the functional protein interaction network, and their interaction partners were explored for the presence of CAPs. One hundred random protein networks of the same size and degree distribution as RFIN were generated with the “switching algorithm” (37) implemented in the Random Network Plugin for Cytoscape, which enabled assessing the statistical significance of enrichment of CAPs among interaction partners of CDMs. The interactions were obtained with the Reactome Functional Interaction Cytoscape plug-in (67), and the graphs were visualized in Cytoscape (68).

Peptide selection

For each protein, a set of PTPs was selected on the basis of the following criteria: Only fully tryptic peptides, with no missed cleavages, unique to a particular protein, and with a length between 6 and 20 amino acids were considered. For proteins listed in large-scale proteomic repositories, like the Human PA (http://www.peptideatlas.org/), the Human Plasma PA (40, 41), and a human cell line MS deep sequencing data set (42), the five PTPs most frequently observed and fulfilling the selection criteria were chosen. For proteins that were observed in the proteomic repositories by fewer than five PTPs or that were not previously observed, additional PTPs with good MS properties were selected by bioinformatic prediction. The main criterion for prediction was peptide hydrophobicity estimated with the SSRCalc algorithm (69, 70). Only peptides with an SSRCalc value between 10 and 40 were considered. If fewer than five uniquely mapping peptides could be selected for a protein, peptides mapping to a maximum of three proteins in the UniProt database were also considered.

Crude peptide library generation

Selected peptides were synthesized by the SPOT-synthesis technology (71, 72) (JPT Peptide Technologies), recovered from the solid support and used in an unpurified form. Synthesized peptides were lyophilized in 96-well plates with about 50 nmol of unpurified peptide material per well. Aliquots of the peptides contained in each well were combined to generate mixes of about 100 peptides. Before liquid chromatography–tandem MS (LC-MS/MS) analysis, the peptide mixes were desalted and concentrated with Vydac C18 Silica MicroSpin columns (The Nest Group Inc.). A set of eight synthetic peptides (AAVYHHFISDGVR, HIQNIDIQHLAGK, GGQEHFAHLLILR, TEVSSNHVLIYLDK, TEHPFTVEEFVLPK, NQGNTWLTAFVLK, LVAYYTLIGASGQR, and TTNIQGINLLFSSR) with elution times spanning the solvent gradient was spiked into each mixture to enable the correlation of relative retention times between LC-MS/MS runs.

SRM assay library generation

The fragment ion spectral library was assembled with a hybrid triple quadrupole/ion trap mass spectrometer (5500QTRAP, AB Sciex) by triggering the acquisition of a full fragment ion spectrum upon threshold detection of an SRM trace corresponding to the first fragment ion of the y-series with an m/z above the m/zprecursor + 20 thomson, for the doubly and triply charged peptide precursors. The instrument setup and parameters are described in the Supplementary Methods. The resulting MS/MS spectra were assigned to peptide sequences with Mascot (Matrix Science, version 2.3.0). The search results were validated with a cutoff for the Mascot ion score corresponding to an FDR <1%. All the peptide-spectrum matches taken together constituted the spectral library for target peptides. For SRM assay refinement, the crude peptide mixtures were analyzed on a TSQ Vantage triple quadrupole mass spectrometer (Thermo Fisher) in scheduled SRM acquisition mode. The QQQ spectral library was used to extract the optimal coordinates for the SRM assays, that is, the most intense fragments, relative intensities of fragments, and peptide elution times. Instrument-specific parameters and further method details can be found in the Supplementary Methods.

Plasma handling

Collection, handling, and shipping of the plasma sample for the detectability test were performed by Sera Laboratories International Ltd. Blood was collected from two healthy individuals, one male and one female, with EDTA as an anticoagulant. Plasma was obtained from each sample of blood by centrifuging at 2000g for 10 min at room temperature. After pooling of the two samples, the resulting plasma was filtered through a 0.2-μm filter, aliquoted, and frozen at −80°C for shipping. Upon thawing, Complete Protease Inhibitor Cocktail (Roche) was added.

For the collection of the patient plasma samples, all patients signed an informed consent document. Blood was drawn before surgery and collected into tubes processed with EDTA to prevent coagulation. Within 30 min, blood was centrifuged at the speed of 2000g for 10 min to separate the red blood cells, buffy coat, and plasma. The plasma was removed, aliquoted in 300-μl amounts, and stored at −80°C. The blood sample handling, from drawing to storage, was done within 2 hours.

Plasma protein depletion

Plasma was depleted of the 14 most abundant plasma proteins with the multiple affinity removal system (MARS Hu-14 spin cartridge; Agilent Technologies) according to the manufacturer’s protocol. Depleted samples were exchanged with Vivaspin 500 concentrators with a 5000–molecular weight cutoff (Sartorius Stedim Biotech) and denatured in 6 M urea and 0.1 M ammonium bicarbonate before digestion with trypsin and LC-MS analysis.

Urine handling

Second-morning urine samples were collected from four healthy individuals, two males and two females, in 50-ml conical tubes (Greiner) and spiked with Complete Protease Inhibitor Cocktail (Roche). Urine was centrifuged at 2000g for 10 min at room temperature. The supernatants were transferred to a fresh tube, and the urinary protein concentration was estimated by pyrogallol assay (Sigma). A single pooled urine sample with a final concentration of 120 μg/ml was prepared from the four healthy individuals. Protein precipitation was achieved by adding trichloroacetic acid (Sigma-Aldrich) to 10 ml of urine to a final concentration of 6%. The sample was mixed and incubated at 4°C for 2 hours followed by centrifugation at 14,000g for 15 min. The supernatant was removed, and the pellet was washed twice with 100% ice-cold acetone (Sigma-Aldrich) to remove all interfering compounds. The supernatant was removed, and the pellet was air-dried and resuspended in 300 μl of denaturing buffer containing 8 M urea (Sigma-Aldrich) and 0.1 M ammonium bicarbonate (Sigma-Aldrich).

Plasma and urine protein digestion

After reduction and alkylation with dithiothreitol (Sigma-Aldrich) and iodoacetamide (Sigma-Aldrich), the proteins were digested with sequencing-grade porcine trypsin (Promega) at a protease/protein ratio of 1:50 for plasma and 1:100 for urine. Digests were desalted and concentrated with Vydac C18 Silica MicroSpin columns (The Nest Group Inc.) and Sep-Pak C18 cartridges (Waters), for plasma and urine, respectively, before LC-MS analysis. The crude plasma digest was prepared in the same way as depleted samples. An aliquot of retention time calibration peptides from RT-Kit-WR (Biognosys) was spiked into each sample to allow for the correlation of relative retention times between LC-MS runs. The extracted elution times of the RT peptides were used to calculate an iRT value relative to the RT peptides for each SRM assay according to the vendors’ instructions (44).

Target detection in plasma and urine

For target detection in depleted plasma, crude plasma, and urine, peptide preparations were analyzed on a TSQ Vantage with the instrument setup and parameters as described in the Supplementary Methods. The refined SRM assays from the library, constituted by the relative intensities of the five most intense fragments and the peptide elution time of each target, were used to detect endogenous peptides. Additionally, five assays were selected as positive controls for both plasma and urine, to be monitored in each MS run, and decoy transition groups were equally distributed over all runs for the subsequent estimation of the FDR (26). Decoy transition groups were generated by subtracting or adding a random integer to Q1 and Q3 m/z values as described by Reiter et al. (26). In each run, about 400 target transitions were monitored, resulting in a total number of 60 MS runs per sample to test the detectability of all refined SRM assays. SRM acquisition was performed with Q1 and Q3 operated at a resolution of 0.7 m/z half-maximum peak width, with a retention time window of 240 s and a cycle time of 2.0 s. To better cover the isotopic envelope, we set the Q1 values for triply charged precursors to average molecular masses. Resulting SRM data were analyzed with mProphet (26). The following subscores of each assay were considered for the calculation of the discriminant score for the detected peak groups: shape score, delta iRT, intensity-correlation-with-assay-score, transition-coelution-score, and total-intensity-score. The top-ranked peak group for each target and decoy transition group was used for the FDR estimation as described by Reiter et al. (26).

Monitoring biomarkers in patient plasma

The plasma peptide preparations were analyzed on a 5500QTRAP (AB Sciex) with the instrument setup and parameters as described in the Supplementary Methods. For each target peptide, a heavy isotope–labeled internal standard (JPT Peptide Technologies and Thermo Fisher) was spiked in the plasma peptide mixture for accurate quantification. For each peptide, three transitions were monitored for the heavy and light version. Resulting SRM data were analyzed with mProphet (26). The following subscores of each assay were considered for the calculation of the discriminant score for the detected peak groups: shape score, intensity-correlation-with-assay-score, transition-coelution-score, total-intensity-score, light_heavy_correlation, var_light_heavy_shape_score, and light_heavy_coelution_score. The top-ranked peak group for each target and decoy transition group was used for the FDR estimation as described by Reiter et al. (26). Protein significance analysis was performed with SRMstats (27). In the first step, data preprocessing was performed by transforming all transition intensities into log2 values. Then, a constant normalization was conducted on the basis of reference transitions for all proteins, which equalized the median peak intensities of reference transitions from all proteins across all MS runs and adjusted the bias to both reference and endogenous signals. Protein-level quantification and testing for differential abundance in the different patient groups were performed with the linear mixed-effects model implemented in SRMstats. Each protein is tested for abundance differences between OC patients and patients with BOTs. The P values were adjusted to control the FDR at a cutoff of 0.05 (27). All proteins with a P value below 0.01 and a fold change larger than 1.1 were considered significant.

Supplementary Materials

www.sciencetranslationalmedicine.org/cgi/content/full/4/142/142ra94/DC1

Materials and Methods

Results

Fig. S1. Success rate of the assay generation and detectability depending on the peptide selection source.

Fig. S2. mProphet results for the detectability test for cancer-associated proteins in depleted plasma and urine.

Fig. S3. Distribution of estimated concentrations extracted from Human Plasma and Urine PA of detected proteins in crude plasma, depleted plasma, and urine.

Fig. S4. Transition specificity of detected peptides in the background of the Human PA.

Fig. S5. Functional protein interaction network for ovarian clear cell carcinoma.

Fig. S6. Functional protein interaction network for pancreatic cancer.

Fig. S7. Functional protein interaction network for pancreatic neuroendocrine cancer.

Fig. S8. Functional protein interaction network for childhood medulloblastoma.

Fig. S9. Functional protein interaction network for glioblastoma.

Fig. S10. Functional protein interaction network for breast cancer.

Fig. S11. Functional protein interaction network for colorectal cancer.

Fig. S12. mProphet results for the first round of the detectability test for cancer-associated proteins in depleted plasma and urine.

Fig. S13. Transition specificity of detected peptides in the background of the complete human proteome.

Table S1. Cancer-associated proteins (Excel file).

Table S2. Proteotypic peptides for SRM assay generation (Excel file).

Table S3. SRM assays for cancer-associated proteins (Excel file).

Table S4. Detected peptides and proteins of the cancer-associated proteins in depleted plasma (Excel file).

Table S5. Detected peptides and proteins of the cancer-associated proteins in plasma (Excel file).

Table S6. Detected peptides and proteins of the cancer-associated proteins in urine (Excel file).

References and Notes

  1. R. Milo, R. Kashtan, S. Itzkowitz, M. E. J. Newman, U. Alon, On the uniform generation of random graphs with prescribed degree sequences. arXiv:cond-mat/0312028v2 [cond-mat.stat-mech] (2004).
  2. Acknowledgments: We thank L. Reiter and A. Bensimon for helpful discussions, C.-Y. Chang for support with the SRMstats analysis, and R. Ossola for help with measurement. Funding: The project was supported in part by the Swiss NSF (grant 3100A0-107679) and by the European Research Council (grant ERC-2008-AdG 233226). M.S. is the recipient of a Natural Sciences and Engineering Research Council of Canada Postgraduate Scholarship M award. T.F. and E.W.D. are supported by the NIH–National Human Genome Research Institute (grant HG005805 to R.L.M.) and the Duchy of Luxembourg Systems Biology initiative (to R.L.M.). Author contributions: R.H., M.S., O.R., and R.A. designed the experiments; E.N.-M. provided the patient plasma samples; R.H., M.S., N.S., U.K., and C.C. performed the experiments; R.H., M.S., N.S., O.R., H.R., and A.S. analyzed the data; E.N.-M. provided the patient specimen; T.F., E.W.D., and R.L.M. developed the platform for the accessibility of the data; R.H., M.S., and R.A. wrote the paper. Competing interests: O.R. is an employee of Biognosys AG. This company funded parts of the work. The other authors declare that they have no competing interests. Data and materials availability: All data sets have been submitted to the PeptideAtlas SRM Experiment Library (PASSEL, http://www.peptideatlas.org/passel/) under the experiment names “Detectability information in depleted plasma: Human CAP depleted plasma; Detectability information in crude plasma: Human CAP crude plasma; Detectability information in urine: Human CAP urine; Protein quantification in patient plasma specimens: Human CAP ovarian cancer plasma.”
View Abstract

Navigate This Article