Research ArticlePulmonary fibrosis

Peripheral Blood Mononuclear Cell Gene Expression Profiles Predict Poor Outcome in Idiopathic Pulmonary Fibrosis

See allHide authors and affiliations

Science Translational Medicine  02 Oct 2013:
Vol. 5, Issue 205, pp. 205ra136
DOI: 10.1126/scitranslmed.3005964

Abstract

We aimed to identify peripheral blood mononuclear cell (PBMC) gene expression profiles predictive of poor outcomes in idiopathic pulmonary fibrosis (IPF) by performing microarray experiments of PBMCs in discovery and replication cohorts of IPF patients. Microarray analyses identified 52 genes associated with transplant-free survival (TFS) in the discovery cohort. Clustering the microarray samples of the replication cohort using the 52-gene outcome-predictive signature distinguished two patient groups with significant differences in TFS. We studied the pathways associated with TFS in each independent microarray cohort and identified decreased expression of “The costimulatory signal during T cell activation” Biocarta pathway and, in particular, the genes CD28, ICOS, LCK, and ITK, results confirmed by quantitative reverse transcription polymerase chain reaction (qRT-PCR). A proportional hazards model, including the qRT-PCR expression of CD28, ICOS, LCK, and ITK along with patient’s age, gender, and percent predicted forced vital capacity (FVC%), demonstrated an area under the receiver operating characteristic curve of 78.5% at 2.4 months for death and lung transplant prediction in the replication cohort. To evaluate the potential cellular source of CD28, ICOS, LCK, and ITK expression, we analyzed and found significant correlation of these genes with the PBMC percentage of CD4+CD28+ T cells in the replication cohort. Our results suggest that CD28, ICOS, LCK, and ITK are potential outcome biomarkers in IPF and should be further evaluated for patient prioritization for lung transplantation and stratification in drug studies.

INTRODUCTION

Idiopathic pulmonary fibrosis (IPF) is a chronic and progressive fibrosing interstitial lung disease with an unknown etiology. Diagnosis of IPF is based on clinical and radiological features and, when available, findings of usual interstitial pneumonia on lung biopsy. IPF patients have an overall median survival of 3 to 3.5 years (1). The disease is more prevalent and probably more lethal among males (2, 3). With the exception of lung transplantation, no therapy has been proven beneficial for IPF. The course of IPF is highly variable and largely unpredictable among individual patients. Disease progression in current clinical practice is monitored by pulmonary function tests, including forced vital capacity (FVC) and diffusion capacity for carbon monoxide (DLCO), high-resolution computed tomography scans, and measures of oxygenation. Previous studies have demonstrated that changes in dyspnea score, total lung capacity, and FVC over 12 months or scores calculated on the basis of age, gender, FVC, and DLCO at presentation seem to correlate with disease severity or outcome in IPF (26). Although these advances allow for staging of patients with IPF, they do not address the difficulty of predicting outcomes for patients with very similar clinical presentation or provide insight into molecular mechanisms of disease.

Current evidence suggests that plasma protein concentrations or changes in blood cells may be informative of disease presence, severity, and prognosis in IPF patients (712). Recently, a difference in the peripheral blood transcriptome was shown between IPF patients and healthy controls (13, 14); however, the ability of the transcriptome to predict outcome was not assessed. Given the evidence that peripheral blood mononuclear cell (PBMC) gene expression is informative of disease presence and outcomes in other clinical entities, such as multiple sclerosis (15, 16), heart transplant rejection (17), pulmonary hypertension associated with scleroderma (18), and lung cancer (19), among others, we hypothesized that PBMC gene expression patterns may be predictive of poor outcomes in IPF patients. For this purpose, we examined PBMC gene expression in two independent cohorts and identified a signature of 52 genes significantly associated with transplant-free survival (TFS) in both cohorts. Decreased expression of genes belonging to “The costimulatory signal during T cell activation” Biocarta pathway, in particular, CD28, ICOS, LCK, and ITK, was associated with shorter TFS, findings confirmed by quantitative reverse transcription polymerase chain reaction (qRT-PCR). The addition of these genes to an outcome prediction model improved its performance compared to a model that only included clinical parameters. Our findings suggest that PBMC gene expression may improve outcome prediction in IPF.

RESULTS

Patient population and clinicopathology

IPF patients included in this prospective cohort study were followed in clinics (at 3- to 4-month intervals) from blood draw until death or completion of the study. The time-to-event outcome analyzed was TFS; in this analysis, transplants and deaths were both counted as events. Figure 1 provides information regarding the cohorts and study design. The discovery (n = 45) and replication (n = 75) cohorts were similar with respect to age, smoking status, pulmonary function tests, diagnostic strategy, and use of immunosuppression with the exception of gender, race, and lung transplants (table S1). Although 75.5 and 68% of subjects in the discovery and replication cohort were Caucasian males, respectively, the discovery cohort patients had a more diverse ethnic background. Females were more represented in the replication than in the discovery cohort (30.7 versus 11.1%, respectively). The rate of lung transplants was higher in the replication cohort (20%) compared to the discovery cohort (4%).

Fig. 1 Study design and cohorts.

The outline summarizes the studied cohorts, the experiments performed in each cohort, and the statistical analyses used. The horizontal arrows represent the confirmation of microarray and qRT-PCR experiments in both cohorts.

Microarray analysis of the discovery cohort

RNA was isolated from the PBMCs of patients (n = 45), labeled, and hybridized to GeneChip Human 1.0 exon ST arrays at the University of Chicago. Using significance analysis of microarrays (SAM), we identified 52 genes that were significantly [false discovery rate (FDR) <5%, Cox score ≥2.5 and ≤−2.5] associated with TFS in this cohort. Increased expression of 7 genes (genes with a Cox score ≥2.5) and decreased expression of 45 genes (genes with a Cox score ≤−2.5) were correlated with shorter TFS times (Table 1).

Table 1 A 52-gene signature associated with TFS in the discovery cohort.

Expression data were collected for genes with a Cox score ≥2.5 and ≤−2.5 (FDR <5%). A positive Cox score indicates that higher expression correlates with shorter TFS, whereas lower expression indicates longer TFS time. A negative score indicates that higher expression correlates with longer TFS time, whereas lower expression correlates with shorter TFS time.

View this table:

To determine the pathways associated with TFS, we performed a survival gene set analysis (GSA) in the discovery cohort. GSA identified 18 pathways (FDR <5%) associated with TFS (table S2). Among them, “The costimulatory signal during T cell activation” Biocarta pathway (Table 2 and table S2) was the top-ranked pathway with a maxmean score of −1.91, indicating that lower expression of most genes in this pathway was correlated with shorter TFS. CD28, ICOS, LCK, and ITK were the genes of this pathway with the strongest association with TFS when underexpressed (Cox scores = −3.12, −3.01, −2.77, and −3.2, respectively) (Table 2), and they were also part of the 52-gene outcome-associated signature. Because GSA calculates a partial likelihood Cox score statistic for each gene after fitting a full multivariate model, whereas SAM uses one gene at a time to estimate Cox scores based on univariate models, we observed slight differences between the Cox scores of CD28, ICOS, LCK, and ITK calculated by SAM from those calculated by GSA (Tables 1 and 2) in the discovery cohort; however, they fully agree in direction and magnitude.

Table 2 Genes in “The costimulatory signal during T cell activation” Biocarta pathway associated with TFS in the discovery cohort.

A positive Cox score indicates that higher expression correlates with shorter TFS time, whereas lower expression indicates shorter TFS time. A negative Cox score indicates that higher expression correlates with longer TFS time, whereas lower expression correlates with shorter TFS time.

View this table:

Microarray analysis of the replication cohort

RNA isolated from PBMCs obtained from IPF patients at the University of Pittsburgh was labeled and hybridized to Agilent whole human genome microarrays. To determine whether the 52-gene TFS predictive signature identified in the discovery cohort predicted outcome in the replication cohort, we used hierarchical clustering. Briefly, gene expression values in the replication cohort, of the 52 genes derived from the discovery cohort, were used in a hierarchical clustering algorithm that uses expression values to cluster samples. This clustering algorithm identified two major patient clusters in the replication cohort (Fig. 2A). The patients in the two clusters differed significantly with respect to TFS [hazard ratio, 1.96; 95% confidence interval (CI), 1.01 to 3.8] (Fig. 2B) but did not differ significantly with respect to clinical variables (table S3). The median TFS at the conclusion of the observation period for replication cohort patients in cluster 1 was 3.44 years compared to 1.62 years for patients in cluster 2 (Fig. 1B and table S4).

Fig. 2 Hierarchical clustering discriminates subgroups with outcome differences in the replication cohort.

(A) Hierarchical clustering of IPF patients from the replication cohort (n = 75) based on the 52-gene signature found in the discovery cohort to be associated with TFS (FDR <5%, Cox score ≥2.5 and ≤−2.5). Two major clusters of IPF patients were identified. Every row represents a gene, and every column, a patient. Color scale is shown adjacent to heat map in log2 scale; generally, yellow denotes increase over the geometric mean of samples, and purple, decrease. (B) TFS differs between clusters in the replication cohort; the median survival of each group is depicted in dotted vertical lines; n at risk is the number of IPF patients at risk of death or lung transplant at the beginning of each time point. P value was determined by the log-rank test.

TFS GSA was performed on the Agilent microarray gene expression data, independently obtained from the replication cohort. This analysis yielded “The costimulatory signal during T cell activation” Biocarta pathway as the top-ranked pathway that correlated with TFS with a maxmean score of −1.24 (table S5). Similar to our previous observation in the discovery cohort, CD28, ICOS, LCK, and ITK were also the genes with the lowest Cox score within this pathway in the replication cohort (Table 3).

Table 3 Genes in “The costimulatory signal during T cell activation” Biocarta pathway associated with TFS in the replication cohort.

A positive Cox score indicates that higher expression correlates with shorter TFS time, whereas lower expression indicates shorter TFS time. A negative score indicates that higher expression correlates with longer TFS time, whereas lower expression correlates with shorter TFS time.

View this table:

Association of CD28, ICOS, LCK, and ITK with poor IPF outcomes

To confirm the microarray findings in the discovery cohort, we designed a custom SmartChip qRT-PCR assay that allowed us to simultaneously measure the expression of CD28, ICOS, LCK, and ITK as well as housekeeping genes in multiple samples. SmartChip expression values (reflected by 1 − ΔCt) from the discovery cohort (n = 43) were significantly correlated with the Affymetrix microarray gene expression values for CD28 (r = 0.71; 95% CI, 0.53 to 0.83), ICOS (r = 0.6; 95% CI, 0.38 to 0.77), LCK (r = 0.5; 95% CI, 0.23 to 0.70), and ITK (r = 0.6; 95% CI, 0.37 to 0.77) (Fig. 3).

Fig. 3 qRT-PCR confirms microarray findings in the discovery cohort.

Correlation between log2-transformed microarray gene expression values and corresponding SmartChip qRT-PCR expression levels for CD28, ICOS, LCK, and ITK in patients (n = 43) from the discovery cohort. P values were determined by Student’s t distribution for Pearson correlation.

In the replication cohort (n = 74), decreased expression of CD28, ICOS, LCK, and ITK (split at 4.858, 6.303, 4.333, and 5.069 cycles, respectively) was significantly associated with decreased TFS (Fig. 4A). At the end of the observation period, TFS of patients with low CD28 expression was 22% compared with 65% among patients with high CD28 expression. In patients with low ICOS expression, TFS was 16% compared to 70% among patients with high ICOS expression. In patients with low ITK expression, TFS was 24% compared to 62% among patients with high ITK expression. In patients with low LCK expression, TFS was 30% compared to 57% among patients with high LCK expression. A decrease in CD28, ICOS, LCK, or ITK expression was individually associated with median TFS that ranged from 0.92 to 1.17 years, and increased expression was associated with longer median TFS, ranging from 2.39 to 3.44 years (table S4). The unadjusted hazard ratios for CD28 (3.2; 95% CI, 1.73 to 5.92), ICOS (4.52; 95% CI, 2.42 to 8.42), LCK (2.1; 95% CI, 1.14 to 3.86), and ITK (2.3; 95% CI, 1.25 to 4.23) were between 2.1 and 4.5, indicating that low levels of expression of these genes (when split by their median value) at evaluation were associated with a two- to fourfold higher risk of dying or having a lung transplant. TFS prediction was also significant after adjusting continuous ΔCt values of each individual gene to age, gender, and percent predicted FVC (FVC%) (tables S6 to S9).

Fig. 4 CD28, ICOS, LCK, and ITK are potential IPF outcome biomarkers.

(A) TFS analysis in the replication cohort (n = 74) with available qRT-PCR data for CD28, ICOS, LCK, and ITK. In the Kaplan-Meier plots for each gene, the red lines are patients with expression levels above the ΔCt median value (representing a decrease in gene expression); the black lines are patients with expression levels below the ΔCt threshold (representing an increase in gene expression); the median survival of each group is depicted in dotted vertical lines. P values were determined by the log-rank test. (B) AUC of time-dependent ROC analysis for TFS based on clinical and/or genomic models in replication cohort subjects with all available variables (n = 72). Genomic model included continuous ΔCt values of CD28, ICOS, LCK, and ITK. Clinical model included age, gender, and FVC%. P values were determined by the Wilcoxon signed rank test.

We compared the area under the receiver operating characteristic (ROC) curve (AUC) of a genomic model (qRT-PCR expression of CD28, ICOS, LCK, and ITK), a clinical model (age, gender, and FVC%), and a combined genomic and clinical model (qRT-PCR expression of CD28, ICOS, LCK, and ITK along with age, gender, and FVC%). The highest AUC of all tested Cox proportional hazard models was observed at 2.4 months (0.2 years). The AUC for the combined genomic and clinical model at this time point was higher (78.5%) than that for the genomic model alone (76.6%) or the clinical model alone (70.9%) (Fig. 4B and table S10). The AUC differences between these models were statistically significant.

Changes in CD4+CD28+ T cells and gene expression findings

To evaluate the potential cellular source of the PBMC gene expression changes, we correlated the qRT-PCR expression level (reflected by 1 − ΔCt) of CD28, ICOS, LCK, and ITK with the percentage of CD4+CD28+ T cells in PBMCs, in replication cohort patients with simultaneous assays (n = 72). CD28 (r = 0.58; 95% CI, 0.41 to 0.72), ICOS (r = 0.54; 95% CI, 0.35 to 0.69), LCK (r = 0.39; 95% CI, 0.17 to 0.57), and ITK (r = 0.44; 95% CI, 0.23 to 0.61) were significantly correlated with the percentage of CD4+CD28+ T cells in PBMCs (Fig. 5), suggesting that a decreased number of these cells may explain, at least in part, the decreased expression of these genes. Along these lines, decreased percentage of CD4+CD28+ T cells in PBMCs (split at the median percentage or 27.8%) was associated with decreased TFS in the replication cohort (fig. S1A). TFS prediction was significant after adjusting the CD4+CD28+ T cell percentages to age, gender, and FVC% (table S10). Predictive models for death or lung transplant, including the percentage of CD4+CD28+ T cells in PBMCs, demonstrated an outcome prediction that was lower than predictive models, using qRT-PCR expression of CD28, ICOS, LCK, and ITK (fig. S1B and table S11).

Fig. 5 Expression levels of CD28, ICOS, LCK, and ITK correlate with the number of circulating CD4+CD28+ T cells.

Correlation between the percentage of CD4+CD28+ T cells in PBMCs and their corresponding 1 − ΔCt SmartChip qRT-PCR expression levels of CD28, ICOS, LCK, and ITK in (n = 72) patients from the replication cohort. P values were determined by Student’s t distribution for Pearson correlation.

There were no statistically significant differences (P = 0.52, Fisher’s exact test) in the use of immunosuppression between the patients with high versus low percentage of CD4+CD28+ T cells in PBMCs (split by the median) in the replication cohort; we also did not find immunosuppression use as an independent predictor of TFS in the discovery and replication cohorts (P = 0.59 and 0.23, respectively, Cox proportional hazard model). CD28, ICOS, LCK, or ITK expression levels did not correlate (P > 0.05 for each gene, Student’s t distribution for Pearson correlation) with the absolute number of peripheral blood lymphocytes in IPF patients (n = 35) from the discovery cohort that had this measure at the time of PBMC extraction.

Given the reported associations of increased number of CD4+CD28null T cells in IPF patients with poor outcomes (11), we measured the protein expression by flow cytometry of the T cell costimulatory protein ICOS, the T cell receptor complex protein CD3ε, and the tyrosine kinases LCK and ITK among paired autologous CD4+CD28+ and CD4+CD28null T cells in patients with IPF from the replication cohort. Although these proteins were significantly decreased in CD4+CD28null T cells (figs. S2 and S3), the percentage of CD4+CD28null T cells in PBMCs was not significantly correlated with the expression of CD28, ICOS, LCK, and ITK genes in IPF patients from the discovery cohort (n = 72) with simultaneous assays (P > 0.05 for each gene, Student’s t distribution for Pearson correlation).

DISCUSSION

Here, we identified changes in the expression of genes and pathways in PBMCs that correlated with poor IPF outcomes in two independent cohorts from different academic institutions, using different microarray platforms. We initially identified a signature of 52 genes as significantly associated with shorter TFS in the discovery cohort. Using this signature, we clustered the patients in the replication cohort to look for TFS differences between the patients in the major clusters and identified two clusters of patients with significant differences in TFS. Analysis of gene sets associated with shorter TFS showed decreased expression of most of the genes of “The costimulatory signal during T cell activation” Biocarta pathway in both cohorts. The genes CD28, ICOS, LCK, and ITK were members of the 52-gene signature and had the lowest Cox score when performing GSA, thus having the highest association with shorter TFS in this pathway when underexpressed. qRT-PCR confirmed that IPF patients with decreased expression of CD28, ICOS, LCK, and ITK had shorter TFS. A combined genomic and clinical prediction model including ΔCt expression of CD28, ICOS, LCK, and ITK along with age, gender, and FVC% provided better outcome prediction than using the clinical predictors alone.

Recognition that the course of IPF is variable and unpredictable has generated substantial interest in molecular biomarkers. Increases in the concentrations of peripheral blood proteins such as Krebs von den Lungen-6 (KL-6), surfactant protein A (SP-A), chemokine ligand 18 (CCL18), matrix metalloproteinase 7 (MMP7), intercellular adhesion molecule 1 (ICAM), and interleukin-8 (IL-8) (7, 10, 12, 20) have all been associated with decreased survival in IPF patients. However, these studies rarely contained a replication cohort and were limited in their discovery potential because they only tested a small, predefined set of markers. Recently, a study reporting a comparison of whole-blood transcriptomes of patients with IPF to healthy controls demonstrated the potential wealth of information available in the peripheral blood of patients with IPF (13); however, this study did not contain any information about outcome-associated genes or the potential cellular source of the gene expression changes. Thus, the attributes that distinguish our study from previous work are the focus on an unbiased genome-scale screening for predictors of outcomes, the use of a discovery and a replication cohort, and the attempt to outline the cellular source of the signature. Our unbiased screen led us to discover that decreases in molecules and pathways rarely studied in IPF, such as the T cell costimulatory proteins CD28 and ICOS, the tyrosine kinases LCK and ITK, as well as other members of the Biocarta pathway “The costimulatory signal during T cell activation,” are indicative of more severe outcomes in IPF. Decreases in gene expression of CD28, ICOS, LCK, and ITK may be related to a decrease in the number of CD4+CD28+ T cells in the peripheral blood—a finding that has not been previously reported in IPF and warrants detailed mechanistic follow-up.

The clinical implications of predicting outcome in IPF are substantial. The only effective therapy currently available for IPF patients is lung transplantation. The timing of transplantation is determined by the clinical evaluation, combined with the lung allocation score (21). Pretransplant evaluations are cost-intensive and not accurate enough to establish optimal timing (22). Shortage of organs is also a limitation. Hence, adding information about the expression of CD28, ICOS, LCK, and ITK to clinical parameters could be useful in determining who should be referred for pretransplantation assessments and specifically, given the ability of the model to predict early outcomes, to prioritize organ allocations to those who have been evaluated. The ability to predict TFS is also important for drug studies in IPF. In a relatively uncommon disease, to show an effect of a drug on mortality, investigators need to recruit patients who are likely to progress during the course of the study. It is possible that patients from a certain risk strata end up randomly and disproportionately assigned to one of the experimental groups, leading to spurious results. The significantly increased AUC of the combined genomic and clinical model in comparison to the clinical model alone may suggest that adding CD28, ICOS, LCK, and ITK expression levels to clinical parameters may help recruiting patients who are likely to progress.

It is important to consider several limitations of our study. First, despite the inclusion of two independent cohorts, the size and diversity of our cohorts are limited. Larger studies on more ethnically and clinically diverse populations will be required to determine the applicability of our markers to the general IPF population. Second, our study was designed to capture only mortality or transplant as outcomes. It would be beneficial to include in the model other IPF outcomes, such as acute exacerbations and disease progression, as reflected by declines in pulmonary functions. In this context, assessing gene expression changes during disease progression would be a highly useful tool to evaluate shifts in their patient risk profiles. Finally, although our study supports the emerging notion that proteins, gene transcripts, and cells in the blood are informative with regard to pathogenesis and outcomes in IPF—a disease previously considered to be limited to the lung—it does not provide information whether changes in peripheral blood gene expression have an added or different utility than bloodstream proteins. Future work should assess all likely markers in parallel and determine their relative value as biomarkers.

In summary, in our study, a microarray-derived 52-gene expression profile or qRT-PCR of CD28, ICOS, LCK, and ITK, members of this signature, was sufficient to identify IPF patients destined for poor outcomes. Combining gene expression data with clinical parameters enhanced outcome prediction; thus, our results could have considerable value in clinical evaluations and management of patients with this devastating lung disease. Naturally, despite the reproducibility of our findings across two cohorts, additional and larger studies focused on validating our results will be required before PBMC gene expression can be used clinically for prognosis in IPF.

MATERIALS AND METHODS

Study design: Patients and cohorts

Patients were recruited from the University of Chicago (discovery cohort; n = 45) and the University of Pittsburgh (replication cohort; n = 75). IPF diagnosis was established by a multidisciplinary group at each institution with the American Thoracic Society/European Respiratory Society criteria (23) and was consistent with recent guidelines (24). Patients were excluded in the study if they had evidence of autoimmune syndromes, malignancies, infections, drugs, or occupational exposures known to cause lung fibrosis. The studies were approved by the institutional review boards at the two institutions, and informed consent was obtained from all patients. Demographic and clinical information was collected in all patients at the time of blood draw. Spirometric data and diffusion capacity of the lung for carbon monoxide (DLCO) obtained within 3 months of blood draw were available, with the exception of four IPF patients of the replication cohort who did not have DLCO values available within this time range.

The time-to-event outcome analyzed was TFS. Patients were followed in clinics (at 3- to 4-month intervals) from blood draw until death or completion of the study on 5 February 2011. In this analysis, transplants and deaths were both counted as events. Transplant and vital status could not be confirmed in three patients evaluated at the University of Pittsburgh who were lost to follow-up; these patients were censored at their last visit day.

Microarray experiments and data preprocessing

Microarray expression was determined in two cohorts: a discovery cohort of IPF patients evaluated at the University of Chicago (n = 45) and a replication cohort of IPF patients evaluated at the University of Pittsburgh (n = 75). Microarray experiments were compliant with MIAME (Minimum Information About a Microarray Experiment) guidelines. The complete data sets are available in the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE28221. For the discovery cohort, PBMC samples were obtained by density centrifugation. RNA was extracted with TRIzol (Invitrogen), and labeling reactions were performed with GeneChip WT cDNA Synthesis and Amplification Kit, followed by hybridization with GeneChip Human 1.0 exon ST arrays (Affymetrix) following the manufacturer’s protocol. A microarray experiment was run for every subject’s sample in the discovery cohort, and these experiments were performed at the University of Chicago. Data were processed and normalized with dChip software (25). For the replication cohort, PBMC samples were obtained by density centrifugation. Total RNA was extracted with QIAzol (Qiagen), and labeling reactions were performed with Agilent Quick Amp labeling kit, one-color, followed by hybridization with Whole Human Genome Oligo Microarray, 4 × 44K (G4112F, Agilent Technologies) following the manufacturer’s protocol. A microarray experiment was run for every subject’s sample in the replication cohort, and these experiments were performed at the University of Pittsburgh. To normalize the gProcessedSignal, we performed cyclic loess as previously described (26). Please see Supplementary Methods for more information regarding sample collection, RNA extraction, and microarray experiments.

Given the differences in microarray technologies between the studied cohorts (discovery cohort used Affymetrix, replication cohort used Agilent), we matched the gene probes across platforms in each microarray expression data set. In brief, after each microarray platform normalization, we matched the Affymetrix gene probes (n = 44,280 probes) with the Agilent gene probes (n = 29,807 probes) by their corresponding gene IDs (http://www.ncbi.nlm.nih.gov/gene). Because there are multiple replicated probes for the same gene in each platform studied, we selected only the unique probes with the highest interquartile range variation across the arrays and generated two independent data sets (Affymetrix and Agilent), each with n = 17,417 unique gene symbols. Last, for univariate gene selection and GSA in the discovery cohort, we applied a minimum fold change filter to the previously matched Affymetrix data set (n = 17,417) to exclude noninformative gene probes; for this step, we selected only the Affymetrix gene probes where 10% of the expression values of a given gene probe had at least a fold change of 1.25 from the median expression value of that probe, resulting in a set of n = 11,991 unique gene probes. The corresponding n = 11,991 unique gene probes in the Agilent data set were used for analyses in the replication cohort. SAM was used to test the association between PBMC microarray gene expression and TFS in IPF patients from the discovery cohort, as described in Supplementary Methods.

Hierarchical clustering of samples by TFS-associated genes identified in the discovery cohort microarrays was performed in the replication cohorts’ microarrays with Cluster 3.0 software (27). The samples were hierarchically clustered with median normalization of the genes and centroid linkage, and the similarity metric used was Pearson correlation.

Statistical analyses

Differences between IPF patients. Differences in age and pulmonary function tests between IPF patients were evaluated with an unpaired, two-tailed t test. Differences in gender, smoking status, diagnostic strategy, and use of immunosuppressive therapy were evaluated with Fisher’s exact test. Significance was defined as P < 0.05.

qRT-PCR—Discovery cohort. Affymetrix log2-transformed microarray gene expression values were correlated with their corresponding SmartChip qRT-PCR expression levels (1 − ΔCt) with Pearson correlation. P values were derived with Student’s t distribution. Significance was defined as P < 0.05.

TFS analyses—Replication cohort. For the qRT-PCR outcome cohort analyses, we used the survival (28) and risksetROC (29) packages of the R environment (30). When performing Cox proportional hazard models, we applied the stepAIC (31) approach for variable selection and included all variables as continuous covariates with the exception of gender. In brief, qRT-PCR ΔCt values of CD28, ICOS, LCK, and ITK as well as the percentage of CD4+CD28+ T cells in PBMCs were split by their median value into high- and low-risk ranges, and TFS differences were calculated with Kaplan-Meier curves and the log-rank test. The predictive significance of each gene as well as the percentage of CD4+CD28+ T cell for TFS was evaluated with Cox proportional hazard models after adjusting for clinical covariates known to be associated with poor IPF outcomes (age, gender, and FVC%). Finally, to evaluate which Cox proportional hazard model resulted in higher outcome prediction, we fit five different Cox proportional hazard models in subjects with all available variables (n = 72), as follows: genomic and clinical (ΔCt of CD28, ICOS, LCK, ITK, age, gender, and FVC%), genomic (ΔCt of CD28, ICOS, LCK, and ITK), clinical (age, gender, and FVC%), CD4+CD28+ % (percentage of CD4+CD28+ T cells), and CD4+CD28+ % and clinical (percentage of CD4+CD28+ T cells, age, gender, and FVC%). To plot the differences between the analyzed Cox proportional hazard models, we used time-dependent ROC for censored data (32) and AUC. When deriving the AUC estimates, we performed a 10-fold cross-validation procedure to handle any potential bias. In addition, we compared prediction accuracies (bias-controlled AUCs) of any two Cox regression models with a Wilcoxon signed rank test. Significance was defined as P < 0.05.

Flow cytometry analyses—Replication cohort. The correlation between 1 − ΔCt expression of CD28, ICOS, LCK, and ITK with the percentage of CD4+CD28+ T cells in PBMCs in the replication cohort was performed with Pearson correlation. P values were derived with Student’s t distribution. The comparison of the T cell costimulatory protein ICOS, the T cell receptor complex protein CD3ε, and the tyrosine kinases LCK and ITK between CD4+CD28+ and CD4+CD28null cells was performed with the Wilcoxon test for paired samples. Significance was defined as P < 0.05.

Cross-validation for AUC—Replication cohort. For the 10-fold cross-validation, the whole data set was randomly divided in data sets (folds) of similar size. A test set was randomly selected among one of the 10 folds, and the remaining nine sets were used to train the validation model. Subsequent iterations of training and validation were performed, and within each iteration, a different fold of the data was held out for validation, whereas the remaining folds were used for learning, a procedure that was repeated for a total of 10 times, thus estimating 10 AUCs from each test data set. The final AUC value was estimated from the average of the 10 resulting AUCs at each specific time point. The SE was calculated from the variation of the 10 resulting AUCs at each time point.

SUPPLEMENTARY MATERIALS

www.sciencetranslationalmedicine.org/cgi/content/full/5/205/205ra136/DC1

Methods

Fig. S1. CD4+CD28+ T cells predict TFS.

Fig. S2. CD4+CD28null T cells have decreased protein expression of T cell markers.

Fig. S3. CD4+CD28null cells have decreased protein expression of selected T cell markers.

Table S1. Clinicopathological characteristics of the IPF patients in the two cohorts.

Table S2. Significant gene sets associated with TFS in the discovery cohort.

Table S3. Clinicopathological characteristics of the IPF patients in the two major clusters of the replication cohort.

Table S4. Median TFS times and CIs.

Table S5. Significant gene sets associated with TFS in the replication cohort.

Table S6. Multivariate Cox proportional hazard model including CD28 and clinical variables.

Table S7. Multivariate Cox proportional hazard model including ICOS and clinical variables.

Table S8. Multivariate Cox proportional hazard model including LCK and clinical variables.

Table S9. Multivariate Cox proportional hazard model including ITK and clinical variables.

Table S10. AUCs and SEs for TFS.

Table S11. Multivariate Cox proportional hazard model including the percentage of CD4+CD28+ T cells and clinical variables.

References (3346)

REFERENCES AND NOTES

  1. Acknowledgments: We thank L. Chensny, M. Klesen, and T. Black for their help in patient recruitment, sample collection and preparation, and database management, and A. Sperling for invaluable scientific input, criticism, and advice. Funding: The Dorothy P. and Richard P. Simmons Endowed Chair for Pulmonary Research, HL0894932, HL108642, and HL095397 (N.K.); HL073241 and HL107172 (S.R.D.); HL101740, HL080513, Pulmonary Fibrosis Foundation, and Coalition for Pulmonary Fibrosis (I.N.); and HL98050, HL101740, and HL105371 (J.G.N.G.). Author contributions: Conception and design: J.D.H.-M., I.N., K.F.G., J.G.N.G., S.D.S., and N.K. Patient recruitment, diagnosis ascertainment, and quality control: I.N., K.F.G., K.O.L., R.V., B.M.J.-G., J.D.H.-M., and N.K. RNA extraction, labeling, and microarray hybridization: B.M.J.-G., S.-F.M., and J.D.H.-M. Analysis of microarray data and intellectual contribution: E.F., S.K., J.D.H.-M., B.M.J.-G., G.C.T., S.-F.M., Y.L., Y.H., J.G.N.G., and N.K. qRT-PCR analysis, statistical modeling, and analysis: E.F., S.K., T.J.R., S.D.S., J.D.H.-M., and N.K. Flow cytometry experiments and analyses: J.X., S.R.D., and J.D.H.-M. Manuscript preparation: J.D.H.-M., E.F., S.R.D., I.N., B.M.J.-G., S.R.D., G.C.T., K.F.G., J.G.N.G., and N.K. Competing interests: J.D.H.-M., I.N., T.J.R., and N.K. have a patent application, in conjunction with the University of Pittsburgh, titled “Marker panels for idiopathic pulmonary fibrosis diagnosis and evaluation.” J.D.H.-M., I.N., Y.H., J.G.N.G., and N.K. have a patent application, in conjunction with the University of Chicago. S.R.D. has a patent application, in conjunction with the University of Pittsburgh, for use of T cell characteristics as biomarkers in IPF and other chronic lung diseases. N.K. was a consultant to Sanofi-Aventis and Stromedix, and currently consults for InterMune, Vertex, Promedior, Takeda, and Actelion. N.K. is a recipient of research grants from Centocor in the past and presently Gilead and Celgene. Data and materials availability: The complete data sets are available in the Gene Expression Omnibus database (accession number GSE28221). All materials were generated in the laboratories of N.K. and I.N., and they are available upon request in accordance to institutional regulations and policies.

Correction:In Table 1, the gene name for LPAR6 was annotated incorrectly as "La ribonucleoprotein domain family, member 6." The correct annotation is "Lysophosphatidic acid receptor 6." The PDF and Full Text versions of the paper have been amended.

View Abstract

Navigate This Article