Research ArticleLung Disease

A Blood-Based Proteomic Classifier for the Molecular Characterization of Pulmonary Nodules

See allHide authors and affiliations

Science Translational Medicine  16 Oct 2013:
Vol. 5, Issue 207, pp. 207ra142
DOI: 10.1126/scitranslmed.3007013

Abstract

Each year, millions of pulmonary nodules are discovered by computed tomography and subsequently biopsied. Because most of these nodules are benign, many patients undergo unnecessary and costly invasive procedures. We present a 13-protein blood-based classifier that differentiates malignant and benign nodules with high confidence, thereby providing a diagnostic tool to avoid invasive biopsy on benign nodules. Using a systems biology strategy, we identified 371 protein candidates and developed a multiple reaction monitoring (MRM) assay for each. The MRM assays were applied in a three-site discovery study (n = 143) on plasma samples from patients with benign and stage IA lung cancer matched for nodule size, age, gender, and clinical site, producing a 13-protein classifier. The classifier was validated on an independent set of plasma samples (n = 104), exhibiting a negative predictive value (NPV) of 90%. Validation performance on samples from a nondiscovery clinical site showed an NPV of 94%, indicating the general effectiveness of the classifier. A pathway analysis demonstrated that the classifier proteins are likely modulated by a few transcription regulators (NF2L2, AHR, MYC, and FOS) that are associated with lung cancer, lung inflammation, and oxidative stress networks. The classifier score was independent of patient nodule size, smoking history, and age, which are risk factors used for clinical management of pulmonary nodules. Thus, this molecular test provides a potential complementary tool to help physicians in lung cancer diagnosis.

INTRODUCTION

Computed tomography identifies millions of pulmonary nodules annually, with many being undiagnosed as either malignant or benign (13). In many cases, histopathological diagnosis by biopsy techniques such as fine-needle aspiration is impossible (due to nodule location) or inconclusive (due to small nodule size). The vast majority of these nodules are benign, but nevertheless, many patients with benign nodules undergo unnecessary procedures. It is estimated that only 20% of patients with lung nodules undergoing biopsy or surgery actually have a malignant lung nodule (4). Consequently, there is an unmet need for a noninvasive clinical test that can discriminate between benign and malignant nodules (5, 6).

The performance and development requirements for a diagnostic test to mitigate the use of invasive and costly medical procedures for lung nodule evaluations are as follows.

First, physicians require a negative test result (that is, “benign”) to be correct with high probability (more than 90%) to ensure that malignant nodules are not accidentally eliminated, that is, a high negative predictive value (NPV), which is the percentage of correct negative test results. An NPV of 90% reduces the posttest probability of cancer to 10% or lower, a twofold reduction in cancer risk from the 20% pretest probability of cancer among patients selected for invasive procedures.

Second, the diagnostic test must frequently provide actionable results for clinical utility and economic benefit. This corresponds to the specificity of the test that is the percentage of benign nodules correctly called benign (that is, negative) by the test. Specificity indicates the fraction of patients with benign tumors that can be identified confidently by the test. High-impact tests such as Oncotype DX for treatment stratification of breast cancers have reported actionable results in about 34% of cases (7).

Third, the diagnostic test must be developed and validated on intended-use samples from multiple independent sites without demographic bias on key clinical parameters such as age, nodule size, and gender. Intended-use samples are defined to be radiologically discovered and pathologically confirmed malignant or benign nodules with a diameter of less than 30 mm (stage IA cancers). The intended-use population has a high occurrence of current and former smokers because this is a significant risk factor for lung cancer.

Fourth, development and validation studies should conform to rigorous guidelines for test development such as those recently provided by the Institute of Medicine (IOM) (8).

Previous biomarker studies on lung cancer (915) have not achieved optimal development and performance requirements, in particular, the requirement of achieving an NPV of 90% on a multisite validation study with only stage IA samples.

We present here a 13-protein plasma test, or classifier, meeting the performance and development requirements stated above. We used multiple reaction monitoring (MRM; also known as selected reaction monitoring) mass spectrometry (MS) to measure the concentrations of candidate proteins in plasma (16, 17). The benefits of MRM assays include high protein specificity, large multiplexing capacity, high sensitivity (mid-attomole), significant dynamic range (106), and both rapid and reliable assay development and deployment. MRM has been used for clinical testing of small-molecule analytes for many years and, recently, in the development of biologically relevant assays (1823).

RESULTS

Selection of biomarker candidates for assay development

To identify lung cancer biomarkers in blood that are shed or secreted from lung tumor cells, proteins overexpressed on the cell surface or oversecreted from lung cancer tumor cells relative to normal lung cells were identified from freshly resected lung tumors using organelle isolation techniques combined with MS (24, 25) (table S1). In addition, an extensive literature search for lung cancer biomarkers was performed with public and private resources (table S2). Both the tissue- and literature-sourced biomarkers were required to have evidence of presence in blood (table S3 and fig. S1). Table 1 presents the steps taken in the refinement of these initial 388 protein candidates down to the set of 13 classifier proteins used for validation and performance assessment. The results are presented in the same sequence.

Table 1 Steps in refining the 388 candidates down to the 13-protein classifier.
View this table:

Development of MRM assays

Standard synthetic peptide techniques were used to develop a 371-protein multiplexed MRM assay from the 388 protein candidates. On average, more than four MRM transitions, that is, predetermined and highly specific mass products (17), per protein were developed (tables S4 and S5). Statistical correlation techniques were used to establish a transition-to-protein error rate below 5%, ensuring high-quality MRM assay development (see the Supplementary Materials and fig. S2). Synthetic peptides could not be developed or confidently identified for 17 candidates. The 371-protein MRM assay was applied to plasma samples from patients with pathologically confirmed benign or malignant nodules to determine how many of the 371 proteins could be detected in plasma. A total of 190 MRM assays were able to detect their target proteins in plasma (51% success rate; see also table S4 and fig. S2). This success rate compared favorably to similar efforts (16%) to develop large-scale MRM assays for the detection of diverse cancer markers in blood (26).

Classifier discovery

A total of 143 samples were obtained from three clinical sites (Table 2). All samples were selected to be consistent with intended use, specifically having nodule size between 4 and 30 mm. Cancer and benign samples were pathologically confirmed. After clinical data monitoring, six samples were later found to have nodule size outside the 4- to 30-mm range. Benign and cancer samples were pairwise-matched on age, gender, nodule size, and clinical site to avoid bias during MRM analysis. One benign sample was lost due to experimental deviations. See table S6 for the experimental design.

Table 2 Clinical characteristics of subjects and nodules in the discovery and validation studies.

IUCPQ, Institute Universitaire de Cardiologie et de Pneumologie de Quebec; BAC, bronchioloalveolar carcinoma.

View this table:

The 371-protein MRM assay was applied to the 143 discovery samples, and the resulting transition data were analyzed to derive a 13-protein classifier fit to a logistic regression model (Table 3). The steps in the refinement from the 371 proteins to the 13 proteins participating in the classifier are summarized in Table 1. The key step in this refinement was the identification of 36 cooperative proteins. A protein was deemed cooperative if found more frequently on best-performing panels than expected by chance alone. This strategy was motivated by the intent to capture the integrated behavior of proteins within lung cancer–perturbed networks. This was a defining step in the discovery of the classifier because the most cooperative proteins were often not the proteins with best individual performance. Full details of the estimation procedure and discovery process appear in Materials and Methods; sample classifier scores are listed in table S6; and the 36 cooperative proteins are provided in table S7. The applicability of logistic regression modeling to the discovery data set is provided in table S8.

Table 3 The 13-protein logistic regression classifier.
View this table:

Classifier performance in discovery study

We assessed the performance of the classifier in the discovery study in terms of NPV and specificity (Fig. 1). When the classifier predicts that a patient from the intended-use population has a benign tumor, the NPV is the probability that this prediction is true. NPV can be calculated from the classifier’s sensitivity, specificity, and the estimated cancer prevalence in the intended-use population (27). Specificity is the fraction of the benign nodules that the classifier can detect with high confidence. Thus, the higher the specificity at a given high NPV, the more patients with benign nodules can be rescued from unnecessary invasive procedures.

Fig. 1 Performance of the classifier on the discovery samples (n = 143) and validation samples (n = 104).

NPV and specificity (SPC) are presented in terms of classifier score. A cancer prevalence of 15% was assumed.

We assumed a cancer prevalence of 15% based on estimates from the literature (4). Note that this prevalence is larger than the prevalence for the much smaller nodules studied in the National Lung Screening Trial study (28). The classifier generated a classifier score, ranging from 0 to 1. Any reference value in this range can be defined so that a sample was predicted to be benign if its classifier score was not above the reference value, or malignant otherwise. We plotted in Fig. 1 the NPV and the specificity of the classifier on the discovery samples as a function of the reference value.

Table 4 reports the classifier’s performance on the discovery samples for multiple lung cancer prevalences with selected reference values. For each prevalence, the reference value was selected that corresponded to a discovery NPV of 95%. For a cancer prevalence of 15%, the reference value of 0.60 was selected and the classifier had an NPV of 95 ± 2% and a specificity of 66 ± 11% on the discovery samples, where 95% confidence intervals (CIs) were reported (see the Supplementary Materials). The receiver operating characteristic (ROC) curve and the corresponding area under the curve (AUC) of the classifier on the discovery samples are reported in fig. S3.

Table 4 Performance of the classifier in discovery and validation at three cancer prevalences.

The AUC was 0.82, 0.60, and 0.74 in discovery, in validation, and for Vanderbilt samples, respectively. Note that partial AUC is the preferred metric for tests maximizing NPV (27). PPV, positive predictive value.

View this table:

Classifier validation

The 13-protein classifier was fully defined before validation was performed, including the identity of the proteins, the logistic regression model, and the reference value used to classify a nodule as benign or malignant. We followed precisely the bright-line demarcation of a locked-down omics test as defined in the IOM omics report (8).

A total of 52 cancer and 52 benign samples (Table 2) were used to validate the performance of the 13-protein classifier. All validation samples were from different patients than the discovery samples. In addition, 36% of the validation samples were sourced from a new fourth clinical site, Vanderbilt University. The remaining validation samples were selected randomly from the discovery sites. Samples were consistent with intended use and matched as in the discovery study. The classifier was applied to the validation samples and analyzed (see Materials and Methods). See table S9 for the experimental design and classifier scores of individual samples.

The performance of the classifier on the validation samples was summarized in Fig. 1, Table 4, and fig. S3, along with the corresponding performance on the discovery samples. All reference values in Table 4 were selected in discovery and applied directly in validation. For cancer prevalence of 15% and reference value of 0.60, the classifier had NPV of 90 ± 5% and specificity of 44 ± 13% on the validation samples. The NPV and specificity on the Vanderbilt samples were 94 and 56%, respectively, providing a strong sign that the classifier was not overfit to the discovery sites.

Figure 2 presents the application of the classifier to all 247 discovery and validation samples. We compared the clinical risk factors of smoking (measured in pack-years) and nodule size (proportional to the diameter of each circle) to the classifier score assigned to each sample. Nodule size did not appear to increase with the classifier score. Indeed, both large and small nodules were spread across the classifier score spectrum. To quantify this observation, we calculated the Pearson correlation (R) between the classifier score and nodule size, smoking history pack-year, and age (Table 5). The largest R2 was 0.05, indicating that all correlations were either nonexisting or very weak. The implication of this observation is important. The classifier provides information on the disease status of pulmonary nodules that is independent of the three currently used risk factors for malignancy (age, smoking history, and nodule size) (29, 30), and thus provides incremental molecular information of added clinical value. For a similar plot of nodule size versus classifier score, see fig. S4. See table S10 for more details illustrating the impact of clinical characteristics on classifier score.

Fig. 2 Multivariate analysis of clinical (smoking and nodule size) and molecular (classifier score) factors as they relate to cancer and benign samples (n = 247) in the discovery and validation studies.

Smoking is measured by pack-years on the vertical. Nodule size is represented by circle diameter.

Table 5 Pearson correlation between classifier score and clinical risk factors.
View this table:

The molecular foundations of the classifier

By design, the 13 classifier proteins were selected from a candidate list of 388 proteins, each with either empirical evidence or literature support of differential expression in lung cancer tissue, or both (Table 3). The only exception was the protein ISLR, whose function is currently not well characterized in the literature even though it was identified as a lung cancer biomarker in the automatic literature search.

To better understand the specific role that each of these 13 proteins has in lung cancer, and their relationship to each other, they were submitted for pathway analysis using Ingenuity Systems (IPA). First, the transcription regulators most likely to cause a modulation of these 13 proteins were identified. With standard IPA analysis parameters, the four most significant nuclear transcription regulators were FOS (proto-oncogene c-Fos), NF2L2 (nuclear factor erythroid 2–related factor 2), AHR (aryl hydrocarbon receptor), and MYC (myc proto-oncogene protein); see Materials and Methods for details. These proteins regulate 12 of the 13 classifier proteins, with ISLR being the exception (see below).

FOS is common to many forms of cancer. NF2L2 and AHR are associated with lung cancer, oxidative stress response, and lung inflammation. MYC is associated with lung cancer and oxidative stress response. These four transcription regulators and the 13 classifier proteins, collectively, are also highly associated (P = 1.0 × 10−7) with the same three biological networks, namely, lung cancer, lung inflammation, and oxidative stress response. This is summarized in Fig. 3, where the classifier proteins (green), transcription regulators (blue), and three merged networks (orange) are depicted. Only ISLR is not connected through these three networks to other classifier proteins, although it is connected through cancer networks not specific to lung. In summary, the modulation of the 13 classifier proteins can be linked back to a few transcription regulators highly associated with lung cancer, lung inflammation, and oxidative stress response networks—three biological processes reflecting aspects of lung cancer.

Fig. 3 The 13 classifier proteins (green), 4 transcription regulators (blue), and 3 networks (orange lines) of lung cancer, oxidative stress response, and lung inflammation.

All references are human UniProt identifiers.

Derivation of the classifier

MRM technology enabled the simultaneous exploration of a large number of lung cancer–relevant proteins. The definitive step in the derivation of the classifier was the identification of the most cooperative protein biomarkers. Typically, proteins are shortlisted in the discovery process by filtering on individual diagnostic performance. To contrast the difference between filtering proteins based on strong individual performance as opposed to frequency on high-performance panels, we calculated a P value for each of the 21 MRM-robust cooperative proteins using the Mann-Whitney nonparametric test (31). Only 1 of these 21 proteins had a P value less than 0.10 (table S7). In addition, the cooperative protein score for each of these 21 proteins was calculated. The cooperative protein score measured the frequency at which each protein appeared on high-performance panels (see Materials and Methods). The P values and cooperative scores of the 21 MRM-robust cooperative proteins were not correlated (Pearson correlation = −0.24, P = 0.30).

Most informative proteins

Which proteins in the classifier were most informative? To answer this question, we constructed all possible classifiers from the set of 21 MRM-robust cooperative proteins (table S7) and measured their performance. The frequency of each protein among the 100 best-performing panels was determined (see “Frequency” column of table S7). Four proteins (LRP1, COIA1, ALDOA, and LG3BP) were highly enriched, with 95% of the 100 best classifiers having at least three of these four proteins (P < 1.2 × 10−41). The conclusion was that high-performance panels of cooperative proteins for pulmonary nodule characterization were similar in composition to one another with a preference for a set of particularly informative (cooperative) proteins. It is expected that optimizations of the classifier on different technological platforms (for example, sample processing pipelines and MS instrumentation) may result in changes in optimal classifier parameters such as the logistic regression protein coefficients.

DISCUSSION

Classifier score independent of clinical risk factors

Recent work has demonstrated that there is frequently not a correlation between an individual protein’s blood concentration and tumor size (32), as is the case for our signature. Even if individual protein concentrations increased or decreased with tumor size, our classifier score is a combination of these 13 protein concentrations and, so, would not necessarily correlate to tumor size. Of clinical importance is that the classifier provides a new metric, independent of current diagnostic risk factors (29, 30), for assessing the molecular status of a lung nodule.

Limitations of work

The main limitation of the work presented is that both discovery and validation studies were conducted using retrospective samples. A prospective study on intended-use samples is required to further validate the utility of the classifier for clinical use.

A second limitation is that the molecular classifier presented here is not integrated with clinical risk factors such as nodule size, age, and smoking history for the purpose of a single classifier for lung nodule diagnosis. Although an integrated classifier would be ideal, in practice, pulmonologists vary broadly in the use of clinical risk factors, and so, it is actually preferable to have a molecular diagnostic test that produces a score independent of clinical risk factors.

We have presented the discovery and validation of a 13-protein blood-based classifier that effectively stratified benign and malignant lung nodules. We applied systems biology strategies to select protein candidates that had coordinated evidence as lung cancer biomarkers. The classifier proteins were identified from cooperative proteins that jointly outperformed the panel of “best” individual proteins. The classifier provided insightful assessment on the disease status of lung nodules beyond the clinical risk factors currently used by clinicians. By measuring protein abundance in blood samples of patients, the classifier can be used to prevent patients with benign lung nodules from undergoing unnecessary invasive procedures.

MATERIALS AND METHODS

Discovery study design

A retrospective, multicenter, case-control study was performed with K2-EDTA plasma aliquots previously obtained from subjects who provided informed consent and contributed biospecimens in studies approved by the Ethics Review Board (ERB) at the IUCPQ or the Institutional Review Boards (IRBs) at New York University and the University of Pennsylvania, respectively. In addition, plasma samples were provided by study investigators after review and approval of the sponsor’s study protocol by the respective institution’s ERB or IRB, as required. Sample eligibility for the proteomic analysis was based on the satisfaction of the study inclusion and exclusion criteria (Supplementary Materials). Each cancer-benign sample pair was matched in best efforts by gender, nodule size (±10 mm), age (±10 years), smoking history pack-years (±20 pack-years), and center. Independent monitoring and verification of the clinical data associated with both the subject and lung nodule were performed in accordance with the guidance established by the Health Insurance Portability and Accountability Act (HIPAA) of 1996 to ensure subject privacy. The study was powered with a probability of 92% to detect 1.5-fold differences in protein abundance between malignant and benign lung nodules. See table S6 for more details.

Analysis of plasma samples using MRM-MS

The protocol for MRM-MS analysis of plasma aliquots included immunodepletion on IgY14-Supermix resin columns (Sigma), denaturation, trypsin digestion, and desalting, followed by reversed-phase liquid chromatography and MRM-MS analysis of the obtained peptide samples (see details in the Supplementary Materials).

Development of MRM assays

MRM assays for candidate proteins were developed on the basis of synthetic peptides (17, 26, 33). More details are provided in the Supplementary Materials.

Identification of endogenous normalizing proteins

The following criteria were used to identify a transition of a normalization protein: (i) highest median intensity of all transitions from the same protein; (ii) detected in all samples; (iii) ranking high, as a normalizer, in reducing median technical coefficient of variation (CV); (iv) ranking high in reducing median column drift (defined in the Supplementary Materials); and (v) possession of low median technical CV and low median biological CV, that is, median CV of transition intensities that were measured on clinical samples. Six endogenous normalizing proteins were identified (table S11 and fig. S5).

Normalization of raw MRM-MS data

Six normalization transitions (table S11) were used to normalize raw MRM-MS data to reduce sample-to-sample intensity variations within the same study. A scaling factor was calculated for each sample so that the intensities of the six normalization transitions of the sample were aligned with the corresponding median intensities of all human plasma standard (HPS) samples. Assuming that Ni,s was the intensity of a normalization transition i in sample s and Embedded Image is the corresponding median intensity of all HPS samples, then the scaling factor for sample s was given by Embedded Image, whereEmbedded Image(1)was the median of the intensity ratios, and Embedded Image was the median of Ss over all samples in the study. For each transition of each sample, its normalized intensity was calculated as follows:Embedded Image(2)where Ii,s was the raw intensity.

Logistic regression model

The logistic regression classification method (34, 35) was used to combine a panel of transitions into a classifier and to calculate a classification score between 0 and 1 for each sample. The score (Ps) of a sample was determined as follows:Embedded Image(3)where Embedded Image was the logarithmically transformed (base 2), normalized intensity of transition i in sample s; βi was the corresponding logistic regression coefficient; α was a classifier-specific constant; and N was the total number of transitions in the classifier. A sample was classified as benign if Ps was less than or equal to a reference value or cancer otherwise.

Lung nodule classifier development

The classifier development included the following steps.

Normalization of raw MRM-MS data was performed to reduce sample-to-sample intensity variations using a panel of six endogenous proteins as described above. After normalization, MRM-MS data were filtered down to transitions having the highest intensities of the corresponding proteins and satisfying the criterion for detection in a minimum of 50% of the cancer or 50% of the benign samples. A total of 125 proteins satisfied these criteria of reproducible detection. Missing values were replaced by half the minimum detected values of the corresponding transitions in all samples.

Remaining transitions were then used to identify proteins, defined as cooperative proteins, that occurred with high frequency on top-performing protein panels. The cooperative proteins were derived using the following estimation procedure because it was not computationally feasible to evaluate the performance of all possible protein panels.

MCCV (36) was performed on 1 × 106 panels; each panel was composed of 10 randomly selected proteins and fitted to a logistic regression model, as described above, using a 20% holdout rate and 102 sample permutations. The ROC curve of each panel was generated, and the corresponding partial area under the ROC curve (AUC) but above the boundary of sensitivity being 90%, defined as the partial AUC (27, 37), was used to assess the performance of the panel. By focusing on the performance of individual panels at high-sensitivity region, the partial AUC allowed for the identification of panels with high and reliable performance on NPV. The candidate proteins that occurred in the top 100 performing panels with a frequency greater than that expected by chance were identified empirically as cooperative proteins. For each protein, the cooperative score was defined as its frequency on the 100 high-performance panels divided by the expected frequency. Highly cooperative proteins had a score of 1.75 or higher (one-sided P < 0.05), cooperative proteins had a score higher than 1, whereas noncooperative proteins had a score of 1 or less. Note that 1 million panels were sampled to ensure that the 100 top-performing panels were exceptional (empirical P ≤ 10−4). In addition, panels of size 10 were used in this procedure based on empirical evidence that larger panels did not change the resulting list of cooperative proteins. This also avoided overfitting the logistic regression model. In total, 36 cooperative proteins were identified, including 15 highly cooperative proteins.

Raw chromatograms of all transitions of cooperative proteins were manually reviewed. Proteins with low signal-to-noise ratios and/or showing evidence of any interference were removed from further consideration. In total, 21 cooperative and robust proteins were identified.

Remaining candidate proteins were then evaluated in an iterative, stepwise procedure to derive the final classifier. In each step, MCCV was performed with a holdout rate of 20% and 104 sample permutations to train the remaining candidate proteins to a logistic regression model and to assess the variability, that is, stability, of the coefficient derived for each protein by the model. The performance of the model was assessed by partial AUC (27). The protein having the least stable coefficient was identified and removed. This procedure was repeated until no protein was left. During this procedure, a total of 21 logistic regression classifiers were developed, and their respective performances were obtained. Among the 21 classifiers, the 13-protein classifier had the optimal performance and was selected as the classifier for validation. Seven of the 13 proteins in the final classifier were highly cooperative.

Proteins in the final classifier were further trained to a logistic regression model by MCCV with a holdout rate of 20% and 2 × 104 sample permutations. Results for the lung nodule classifier development are summarized in table S7.

Lung nodule classifier validation

The design of the validation study was identical to that of the discovery study but involved plasma samples associated with independent subjects not evaluated in the discovery study. Additional specimens were obtained from Vanderbilt University with similar requirements for patient consent, IRB approval, and satisfaction of HIPAA. Of the 104 total cancer and benign samples in the validation study, half were analyzed immediately after the discovery study, whereas the other half were analyzed later. The study was powered to observe the expected 95% CI of the NPV associated with reference value 0.60. See the Supplementary Materials and table S9 for more details.

The raw MRM-MS data set in the validation study was normalized in the same way as the discovery data set. Variability between the discovery and the validation studies was mitigated by using HPS samples in both studies as external calibrator. See the Supplementary Materials for details. Missing data in the validation study were then replaced by half the minimum detected values of the corresponding transitions in the discovery study. Normalized, calibrated transition intensities were applied to the logistic regression model of the final classifier learned previously in the training phase, from which classifier scores were assigned to individual samples. The performance of the lung nodule classifier on the validation samples was then assessed on the basis of the classifier scores.

IPA pathway analysis

Standard parameters were used. Specifically, in the search for nuclear transcription regulators, requirements were P < 0.01 with a minimum of three proteins modulated. Significance was determined with a right-tailed Fisher’s exact test using the IPA Knowledge Database as background.

Statistical analysis

All statistical analyses were performed with STATA, MATLAB, and/or R. Specific tests and analysis details are indicated at appropriate sections.

SUPPLEMENTARY MATERIALS

www.sciencetranslationalmedicine.org/cgi/content/full/5/207/207ra142/DC1

Materials and Methods

Fig. S1. Summary of 388 lung cancer biomarker candidates.

Fig. S2. Summary of the 371-protein MRM assay.

Fig. S3. ROC curves of the classifier in discovery and validation.

Fig. S4. Scatter plot of nodule size versus classifier score of all 247 patients.

Fig. S5. Performance of individual proteins as normalizers.

Table S1. Clinical information of lung cancer patients enrolled in tissue biomarker studies.

Table S2. Terminologies used for searching potential biomarkers in literature.

Table S3. Information of 388 protein candidates.

Table S4. Summary of protein MRM assay development and detection in blood.

Table S5. MRM assay of individual proteins.

Table S6. Experimental design of the discovery study.

Table S7. The list of 36 cooperative proteins.

Table S8. Results from logistic regression diagnostics.

Table S9. Experimental design of the validation study.

Table S10. Impact of clinical characteristics on classifier score.

Table S11. List of endogenous normalizing proteins.

Table S12. Protein and peptide detection in the discovery study.

References (5071)

REFERENCES AND NOTES

  1. Acknowledgments: We thank the reviewers for many insightful and constructive comments. We also thank the subjects who contributed biospecimens during translational research studies, the research staff at each of the participating institutions, and also the research team at Caprion Proteomics. We also thank M. Laviolette and F. Maltais at the IUCPQ for their helpful discussions, and A. Callahan, S. Rogalski, E. Gonterman, and M.-Y. Brusniak at Integrated Diagnostics for their many contributions to this work. Funding: Supported by Integrated Diagnostics. Author contributions: Individual coauthor contributions, in addition to manuscript drafting, review, and approval, are as follows. Concept and design of overall study: X.-j.L., K.C.F., L.H., and P.K. Identification of tissue biomarkers: study design (D.C. and P.K.), data acquisition (M.S., O.G., J.L., R.A., D.C., and P.K.), and data analysis (X.-j.L. and P.K.). Selection of biomarker candidates: X.-j.L., M.D., and P.K. MRM assay development: study design (X.-j.L., M.S., D.C., and P.K.), data acquisition (H.B., M.S., O.G., J.L., R.A., and D.C.), and data analysis (X.-j.L., C.H., and P.K.). Classifier development including discovery and validation studies: study design (X.-j.L., P.-Y.F., N.D.P., S. Lam, P.P.M., H.P., W.N.R., A.V., K.C.F., and P.K.), data acquisition (S.W.H., S. Law, H.B., M.S., O.G., J.L., R.A., and D.C.), and data analysis (X.-j.L., C.H., P.-Y.F., L.W.L., M.M., and P.K.). Biological interpretation: X.-j.L., M.D., N.D.P., K.C.F., L.H., and P.K. Competing interests: X.-j.L., C.H., P.-Y.F., S.W.H., L.W.L., M.M., S. Law, K.C.F., and P.K. are current and/or past employees of and have equity interest in Integrated Diagnostics; M.D., H.B., M.S., O.G., J.L., R.A., and D.C. are consultants and/or performed contracted work for Integrated Diagnostics; L.H. is a board member with equity of Integrated Diagnostics. A.V. is on the scientific advisory board of Allegro Diagnostics. X.-j.L., C.H., M.D., K.C.F., and P.K. filed patent applications directed toward the composition, methods, kits, and processes described in the United States and in foreign jurisdictions: published as US20130217057 and US20130203096 pending. Data and materials availability: Raw data in mzML format can be downloaded from SRMAtlas (http://www.peptideatlas.org/PASS/PASS00261).
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article