Research ArticleSepsis

A comprehensive time-course–based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set

See allHide authors and affiliations

Science Translational Medicine  13 May 2015:
Vol. 7, Issue 287, pp. 287ra71
DOI: 10.1126/scitranslmed.aaa5993


Although several dozen studies of gene expression in sepsis have been published, distinguishing sepsis from a sterile systemic inflammatory response syndrome (SIRS) is still largely up to clinical suspicion. We hypothesized that a multicohort analysis of the publicly available sepsis gene expression data sets would yield a robust set of genes for distinguishing patients with sepsis from patients with sterile inflammation. A comprehensive search for gene expression data sets in sepsis identified 27 data sets matching our inclusion criteria. Five data sets (n = 663 samples) compared patients with sterile inflammation (SIRS/trauma) to time-matched patients with infections. We applied our multicohort analysis framework that uses both effect sizes and P values in a leave-one-data set-out fashion to these data sets. We identified 11 genes that were differentially expressed (false discovery rate ≤1%, inter–data set heterogeneity P > 0.01, summary effect size >1.5-fold) across all discovery cohorts with excellent diagnostic power [mean area under the receiver operating characteristic curve (AUC), 0.87; range, 0.7 to 0.98]. We then validated these 11 genes in 15 independent cohorts comparing (i) time-matched infected versus noninfected trauma patients (4 cohorts), (ii) ICU/trauma patients with infections over the clinical time course (3 cohorts), and (iii) healthy subjects versus sepsis patients (8 cohorts). In the discovery Glue Grant cohort, SIRS plus the 11-gene set improved prediction of infection (compared to SIRS alone) with a continuous net reclassification index of 0.90. Overall, multicohort analysis of time-matched cohorts yielded 11 genes that robustly distinguish sterile inflammation from infectious inflammation.


Sepsis, a syndrome of systemic inflammation in response to infection, kills about 750,000 people in the United States every year (1). It is also the single most expensive condition treated in the United States, costing the healthcare system more than $20 billion annually (2). Prompt diagnosis and treatment of sepsis is crucial to reducing mortality, with every hour of delay increasing mortality risk (3). Sepsis is defined by the presence of the systemic inflammatory response syndrome (SIRS), in addition to a known or suspected source of infection (1). However, SIRS is not specific for sepsis, because sterile inflammation can arise as a nonspecific response to trauma, surgery, thrombosis, and other noninfectious insults. Thus, sepsis can be difficult to distinguish clinically from systemic inflammation caused by noninfectious sources, such as tissue trauma (4). There is no “gold standard” blood test for distinguishing patients with infections at the time of diagnosis before results become available from standard microbiological cultures. One of the most common biomarkers of infection, procalcitonin, has a summary area under the receiver operating characteristic (ROC) curve (AUC) of 0.78 (range, 0.66 to 0.90) (5). Several groups have evaluated whether cytokine or gene expression arrays can accurately diagnose sepsis; however, because of the highly variable nature of host response and human genetics, no robust diagnostic signature has been found (610).

Both infections and tissue trauma activate many of the same innate immune receptor families, such as the Toll-like receptors (TLRs) and NOD-like receptors (NLRs), and consequently activate largely overlapping transcriptional pathways. Thus, distinguishing conserved downstream effects attributable solely to infections has been exceedingly difficult. Recent work has shown that there are pattern recognition receptors potentially specific to pathogen response, including several glycan receptor families (11). Hence, it may be possible that an infection-specific immune response could be differentiated from sterile inflammation.

The ongoing search for new therapies for sepsis, and for new prognostic and diagnostic biomarkers, has generated several dozen microarray-based genome-wide expression studies over the past decade, variously focusing on diagnosis, prognosis, pathogen response, and underlying sepsis pathophysiology (10). Despite tremendous gains in the understanding of gene expression in sepsis, few insights have translated to improvements in clinical practice. Many of these studies have been deposited into public repositories such as the National Institutes of Health Gene Expression Omnibus (GEO) and ArrayExpress, and thus, there is now a wealth of publicly available data in sepsis. In particular, there are several studies comparing patients with sepsis to patients with noninfectious inflammation (such as SIRS) that occurs after major surgery, traumatic injury, or in non–sepsis-related intensive care unit (ICU) admission (thrombosis, respiratory failure, etc.).

One data set in particular, the Inflammation and Host Response to Injury Program (Glue Grant) (6, 12, 13), has yielded several important findings about the effects of time on gene expression after trauma and in sepsis. One part of the Glue Grant longitudinally examined gene expression in patients after severe traumatic injuries. Several groups have examined these data with respect to time; notable findings are that (i) more than 80% of expressed genes show differential expression after traumatic injury (6); (ii) different clusters of genes recover over markedly different time periods (12); (iii) differing scenarios of inflammation such as trauma, burns, and endotoxicosis exhibit similar gene expression changes (12); and (iv) the extent to which posttrauma gene expression profiles differ from those of healthy controls and their degree of gene expression recovery over time are correlated with clinical outcomes (13, 14). There is thus growing understanding of the importance of the changes that underlie recovery from trauma and their impact on specific clinical outcomes.

We hypothesized that only time-matched comparisons, such as those that compare SIRS/trauma to sepsis at the same clinical time points, would yield genes robustly diagnostic of sepsis. We carried out a comprehensive, time-course–based multicohort analysis of the publicly available gene expression data in sepsis to identify a conserved 11-gene set that can robustly distinguish noninfectious inflammation (such as SIRS, trauma, and ICU admissions) from inflammation due to acute infections, as in sepsis. This 11-gene set had excellent diagnostic power in the discovery cohorts and was then validated in 15 independent cohorts.


Comprehensive search and labeled principal components analysis visualizations

We identified 27 independent gene expression data sets that satisfied our criteria in GEO and ArrayExpress, from which we included a total of 2903 microarrays (table S1) (7, 8, 1540). These 27 data sets comprised only 22 independent cohorts, because the six data sets from the Genomics of Pediatric SIRS/Septic Shock Investigators (GPSSSI) were combined into a single cohort containing 219 patients with SIRS or sepsis at ICU day 1 (1520). Many of the samples used were from the Glue Grant trauma data sets, which have a total of 333 patients sampled at up to eight time points (1301 samples used here) after traumatic injury. These 27 data sets contain cohorts of children and adults, men and women, with a mix of community- and hospital-acquired sepsis, sampled from whole blood, neutrophils, and peripheral blood mononuclear cells (PBMCs).

First, we sought to use the simplest possible methods to see whether noninfected SIRS/trauma patients and sepsis/infection patients could be separated by gene expression. We thus co-normalized all available data sets comparing SIRS/trauma with sepsis/infection in a single matrix. Labeled principal components analysis (PCA) (using 168 genes identified by 10-fold cross-validated Lasso-penalized logistic regression) showed that SIRS/trauma patients can be separated from sepsis patients with modest overlap (Fig. 1A). Next, we labeled each sample as “early” (within 48 hours of admission) or “late” (more than 48 hours after admission). Most of the nonseparable samples were the late samples (Fig. 1B). This finding remained true even when we included healthy controls as a separate class (fig. S1). Previous work has shown that gene expression after trauma, burns, or endotoxemia changes nonlinearly over time (6, 12, 14, 35). This continuous change in expression after initial insult could explain the inability to distinguish noninfected SIRS/trauma from sepsis in the late samples if all time points are treated as equal.

Fig. 1. Labeled PCA comparing sterile SIRS/trauma versus sepsis patients.

(A) Sterile SIRS/trauma and sepsis patients appear to be largely separable in the transcriptomic space, with only a minimal nonseparable set. (B) The same labeled PCA is shown, with labels updated to reflect patients in recovery from noninfectious SIRS/trauma and patients with hospital-acquired sepsis; the late group (>48 hours after hospital admission) is much harder to separate. n = 1094 combined from 15 studies.

Therefore, we sought to get a qualitative sense of whether gene expression during the hospital course after injury is similar among different cohorts. We included all peripheral blood data sets that examined gene expression longitudinally over time after admission for nonseptic events. We used CUR matrix decomposition to identify the 100 genes that were most orthogonal to each other and used these to perform labeled PCA with classes determined by days after injury. Reassuringly, the gene expression at each time point was closest to the time points by which it was bounded (for example, the days [1,2) group was preceded by days [0,1) and followed by days [2,3); Fig. 2 and movie S1). Furthermore, changes in expression over time explained most variance in the data sets, as evidenced by the different day groups changing in each of the first three labeled principal components. In summary, our analysis showed that the changes in gene expression after trauma/ICU admissions (i) proceed in a nonlinear fashion over time and (ii) show similar changes over time across data sets.

Fig. 2. Two views of the first three principal components of labeled PCA of time-course data sets.

Five peripheral whole-blood gene expression data sets were combined and matched for common genes. The genes with the top 100 orthogonality scores were selected via CUR matrix decomposition, and labeled PCA was performed, broken into classes by day. (A and B) The three-dimensional plots of the first three principal components demonstrate that changes by day explain most variance in the data sets, different data sets show similar changes over time, and the changes over time proceed in a nonlinear fashion. Parts (A) and (B) show two different views of the same data; also see movie S1.

Time-matched multicohort analysis

Because changes in gene expression after admission for trauma explain a large amount of variance in the data set, and because these changes proceed nonlinearly, direct comparisons of a patient at admission with that same patient several days later at the time of infection would be confounded by “normal” changes in expression due to recovery from the inciting event, as well as any “abnormal” changes due to the hospital-acquired infection. It would be extremely difficult to disentangle these changes, if not impossible. Consequently, comparisons that do not take clinical time into account will not yield biomarkers that can robustly discriminate infected from noninfected patients (Fig. 1). Therefore, we focused only on infection data sets that also included a time-matched noninfected cohort (to allow for direct time-matched comparisons). We thus separated the data sets into two groups: (i) data sets comparing patients at hospital admission for trauma, surgery, or critical illness versus patients at admission to the hospital for sepsis [GSE28750 (27), GSE32707 (31), GSE40012 (26), and the GPSSSI unique combined data sets (n = 408 samples) (1520)] and (ii) the Glue Grant data sets containing patients with hospital-acquired infections and day-matched noninfected patients, from which we used only patients in the buffy coat sample cohort (Table 1). The Glue Grant trauma cohorts were sampled at roughly 0.5, 1, 4, 7, 14, 21, and 28 days after injury; these cohorts were thus divided into their sampling time bins, creating subgroups in which patients diagnosed with an infection in a given time bin can be compared to noninfected patients in the same time bin. For the buffy coat samples, there were at least 10 patients present in five time bins, and these were thus taken for further study. Thus, we used a total of nine cohorts comparing time-matched SIRS/trauma to sepsis/infection, comprising 663 samples (326 SIRS/trauma controls and 337 sepsis/infection cases; Table 2 shows the cohorts in the multicohort analysis; table S2 shows the individual microarray design matrix).

Table 1. Publicly available gene expression data sets comparing SIRS/ICU/trauma to sepsis/infections.

CAP, community-acquired pneumonia; ARDS, acute respiratory distress syndrome.

View this table:
Table 2. All data sets used in the multicohort analysis.

The numbers after the Glue Grant cohort titles indicate days since infection in the given cohort (for instance, [1,3) are patients from 1 to 3 days since injury).

View this table:

We then applied our previously described (41, 42) multicohort gene expression analysis framework to compare SIRS/trauma with sepsis/infection, including all nine cohorts in a leave-one-data set-out fashion. The output from this analysis underwent a three-step thresholding process [false discovery rate (FDR) <1% for both pooled effect size and Fischer’s method, inter–data set heterogeneity P > 0.01, and absolute summary effect size fold change > 1.5], which yielded 82 genes differentially expressed between SIRS/trauma and sepsis patients across all time points (summary statistics for all 82 genes shown in table S3). To obtain the most parsimonious set of significant genes that best discriminates between classes, we carried out a greedy forward search to identify which combination of the 82 genes produced the best improvements in AUC across all discovery data sets. Here, discrimination is based on an “infection z score” that combines gene expression levels (using the difference of geometric means between positive and negative genes) into a standardized score for each sample in each data set. This yielded a final set of 11 genes (6 overexpressed and 5 underexpressed in sepsis; Table 3 and Fig. 3). Table S4 shows probe-level expression data for these 11 genes in the discovery cohorts. The mean ROC AUC of this 11-gene set in the nine discovery cohorts was 0.87 (range, 0.70 to 0.98; Fig. 4A and fig. S2).

Table 3. The 11-gene set that separates SIRS/trauma from sepsis.

Included are meta-analysis effect sizes, errors, and heterogeneity analyses.

View this table:
Fig. 3. Effect sizes of the 11-gene set.

Forest plots for random effects model estimates of effect size of the positive genes, comparing SIRS/trauma/ICU to infection/sepsis patients in each of the discovery cohorts.

Fig. 4. Results of the 11-gene set in the discovery and neutrophils validation data sets.

(A) ROC curves shown for separating sterile SIRS/ICU/trauma patients from those with sepsis in the discovery data sets. (B) ROC curves shown for separating trauma patients with infections from time-matched trauma patients without infection in the Glue Grant neutrophils validation data sets. (C and D) Glue Grant buffy coat discovery (C) and neutrophils validation samples (D) after >1 day since injury, showing average infection z score in noninfected patients versus patients within ±24 hours of diagnosis. In both cases, there is a significant effect due to both time and infection status. (E and F) Box plots of infection z score by time since injury for buffy coat discovery (E) and neutrophils validation samples (F): patients never infected are compared to patients >5 days before infection, 5-to-1 days before infection, ±1 day of diagnosis (cases), and 2-to-5 days after infection diagnosis. JT trend test was significant (P < 0.01) for an increasing trend from never infected to ±1 day of infection for each time point after admission.

Glue Grant sorted-cells cohort validation

The Glue Grant trauma cohorts have two independent subcohorts: one is the buffy coat cohort (samples processed from 2004 to 2006 on Affymetrix array GPL570), and the other is the sorted-cells cohort, which included neutrophils, monocytes, and T cells [samples processed from 2008 to 2011 on custom Glue Grant Human (GGH) arrays; Table 4]. These cohorts are separate patients, separated in time and profiled using different technologies. Although their inclusion criteria and enrolling sites are largely the same, they are otherwise independent. We thus validated our 11-gene signature in the Glue Grant sorted-cells cohorts. Here, we split the sorted-cells cohorts into the same time bins as the discovery buffy coat cohorts and treated each time bin separately.

Table 4. The Glue Grant sorted-cells cohort.
View this table:

From the sorted-cells subcohort, we expected the neutrophil set to perform most similarly to a whole-blood sample, because neutrophils make up 75 to 85% of the total leukocyte pool after trauma in both infected and noninfected patients (and hence most of the gene expression present in peripheral blood) (fig. S3). Indeed, the 11-gene set performed very well at separating time-matched noninfected trauma patients from septic trauma patients (4 cohorts; 218 samples; mean AUC, 0.83; range, 0.73 to 0.89) (Fig. 4B). Surprisingly, the 11-gene set also showed discriminatory power in the monocytes and T cells from these same patients (monocytes AUC range, 0.71 to 0.97; T cells AUC range, 0.69 to 0.9) (figs. S4 and S5). Because we excluded any sorted-cells data sets from the multicohort analysis, we did not expect diagnostic capability in these cell types. In the sorted-cells cohort, AUC increased with greater time since initial trauma; this may suggest that inflammation due to infection is easier to discriminate as the “genomic storm” of traumatic injury begins to recover.

Examination of the 11-gene set in the Glue Grant cohorts

As expected, in the Glue Grant buffy coat cohort, patients within ±24 hours of diagnosis of infection have significantly higher infection z scores at all time points as compared to time-matched patients without infection; this was validated in the neutrophils cohort [repeated-measures analysis of variance (ANOVA) P < 0.0001; Fig. 4, C and D, and table S5A]. Comparison of the infection z score by time since injury in the buffy coat cohort shows a significant decline over time (repeated-measures ANOVA change over time P < 0.0001), but there appears to be a lesser (though still significant) effect in the neutrophils validation cohort (repeated-measures ANOVA change over time P < 0.05) (Fig. 4, C and D, and table S5A). The interaction of group with time since injury was not significant in either discovery or validation cohorts, suggesting that the decline in infection z scores over time for both groups is likely due to recovery from traumatic injury resulting in reduced inflammation (table S5A).

Next, we analyzed how infection z scores changed in infected patients before and after diagnosis of infection (samples that were not included in identifying the 11-gene set). We grouped the samples from patients who were ever diagnosed with infection on the same hospital stay into four groups according to their time from diagnosis of infection (either greater than 5 days before infection, 5-to-1 days before infection, within ±1 day of diagnosis of infection, or 2-to-5 days after diagnosis of infection, where no group besides the ±1 day of diagnosis of infection was included in the multicohort analysis for discovery of the 11-gene set). We further divided these groups into bins according to days since injury. Within each time bin, the infection z scores for the diagnostic groups increased significantly as they progressed toward infection for both the discovery buffy coat cohort and the validation neutrophils cohort [Jonckheere trend (JT) test P < 0.01; Fig. 4, E and F]. Furthermore, in all cohorts, the infection z score declined in the groups that were 2 to 5 days after infection diagnosis, when patients were beginning to recover from infection, presumably due to antibiotic treatment. This may also explain the increase in diagnostic power as time increases since initial injury. We emphasize that the resulting “peak” in infection z score around the time of infection diagnosis validates the association of the infection z score with clinical infection, because neither the >5 days prior cohorts, the 5-to-1 days prior cohorts, nor the 2-to-5 days after cohorts were included in the multicohort analysis, but still shows the hypothesized trends in both the discovery buffy coat cohort and the validation neutrophils cohort. Similar results were seen in the monocytes and T cells samples (same patients as the neutrophils validation cohorts; figs. S4B and S5B).

The infection z scores for patients who were later infected during their hospital stays were significantly higher in buffy coat samples at the time of admission than for those never infected during their hospital admission (P < 0.01; neutrophils validation group P = 0.05; Fig. 4, E and F). One possibility is that there was a baseline difference in injury severity, and that this might influence the infection z score. Severely injured patients are known to be more susceptible to infection (43). To test this hypothesis, we used linear regression of eventual hospital-acquired infection status, injury severity score, and their interaction to predict infection z score as the independent variable (table S5B). Both eventual hospital-acquired infection status and injury severity score were independently significant in predicting infection z score at admission, indicating that injury severity alone does not explain these effects. The interaction term was significant and negative in both the discovery buffy coat cohort and the validation neutrophils cohort samples, perhaps suggesting that higher infection z score at admission may indicate greater susceptibility to later infection. Further studies are needed to examine this observation.

Clinical utility in the Glue Grant

To test whether the infection z score might add to the clinical determinations of infection, we compared logistic regression using SIRS criteria alone to that using SIRS criteria plus our infection z score in discriminating Glue Grant trauma patients (both buffy coat and neutrophils cohorts) with and without infection. The logistic regression model using SIRS criteria alone had an AUC of 0.64, whereas SIRS criteria plus the infection z score had an overall AUC (using a single coefficient for infections at all time points) of 0.81 (fig. S6). The continuous net reclassification index (NRI) is a measure of how many patients would be correctly reclassified by improving a disease marker; here, the continuous NRI of adding the infection z score to SIRS alone was 0.90 (95% confidence interval, 0.62 to 1.17), where a continuous NRI greater than 0.6 is associated with “strong” improvement in prediction (44).

Independent validation of the infection z score

Next, we validated our score in three independent longitudinal cohorts that included only trauma or ICU patients who eventually acquired infections: GSE6377 (35), GSE12838, and EMEXP3001 (23) (Table 5). All three cohorts followed patients from the day of admission at least through the day of infection diagnosis (mostly VAP). Because all patients in each of the three cohorts acquired infections, they did not have time-matched noninfected controls. To compare the validation cohort infection cases with noninfected trauma patients, we used Glue Grant buffy coat noninfected controls. We internally normalized each cohort using housekeeping genes and then co-normalized with the Glue Grant buffy coat patients using empiric Bayes batch correction. Then, we compared the validation cohorts to the Glue Grant noninfected patients at matched time points as a variable reference. Comparing trauma/ICU patients to a time-matched baseline is necessary because our earlier findings (Fig. 4, C to F) showed a change over time in infection z score in the noninfected patients (table S5A). The three independent longitudinal trauma/ICU cohorts show that patients within ±1 day of infection are generally separable from time-matched noninfected Glue Grant patients, with ROC AUCs ranging from 0.68 to 0.84 (Fig. 5).

Table 5. Publicly available gene expression time-course data sets of trauma patients who develop infections.

VAP, ventilator-associated pneumonia; VAT, ventilator-associated tracheobronchitis; N/A, not available.

View this table:
Fig. 5. No-controls data sets of trauma/ICU patients who develop VAP.

These data sets did not include noninfected patients, so they were empiric Bayes co-normalized with time-matched Glue Grant patients. Orange line shows Glue Grant loess curve. (A) EMEXP3001. (B) GSE6377. (C) GSE12838, both neutrophils and whole-blood samples. In all cases, only the first 8 days since admission are shown, and patients are censored >1 day after diagnosis of infection. (D) ROC curves compare patients within ±1 day of diagnosis (blue points in A to C) with time-matched noninfected Glue Grant patients. See Table 5 for further data set details.

We further validated the 11-gene set in eight additional independent data sets that compared healthy controls to those with bacterial or viral sepsis at admission using whole-blood samples [n = 446: GSE11755 (38), GSE13015 (7), GSE20346 (37), GSE21802 (22), GSE25504 (39), GSE27131 (32), GSE33341 (30), and GSE40396 (25); Table 6]. The infection z scores for all eight data sets were combined in a single violin plot, showing excellent separation (Wilcoxon P < 1 × 10−63; Fig. 6A). The mean ROC for separating healthy and septic patients is 0.98 (range, 0.94 to 1.0; Fig. 6B).

Table 6. Publicly available gene expression data sets in whole blood or neutrophils comparing healthy controls to septic patients.
View this table:
Fig. 6. Discrimination of healthy versus sepsis.

Eight independent validation data sets that met inclusion criteria (peripheral whole blood or neutrophils, sampled within 48 hours of sepsis diagnosis) were tested with the infection z score. (A) Infection z scores for all patients (n = 446) were combined in a single violin plot; error bars show middle quartiles. P values calculated with Wilcoxon rank-sum test. (B) Separate ROC curves for each of the eight data sets discriminating sepsis patients from healthy controls. Mean ROC AUC = 0.98. See Table 6 for further data set details.

Our results provide strong evidence that the infection z score declines over time since admission/injury in whole blood, buffy coat, neutrophils, and monocytes. We have also shown that non–time-matched comparison yields inaccurate classification of infection, especially for late-acquired infections in SIRS/trauma patients. Hence, comparing infection z scores of SIRS/trauma patients at admission with those of late-acquired sepsis/infection patients would be an inaccurate measure of diagnostic power. However, because the effect of the decrease in infection z score over time is relatively monotonous, comparison of admission SIRS/trauma/surgery patients with late-acquired sepsis/infection would provide a lower limit on detection of ROC AUC for the infection z scores. That is, because the infection z score decreases over time, if the noninfected patients tested at admission had been sampled later (at matched times to the sepsis patients), their infection z scores would be lower at that later time (and hence more easily separable from the higher infection z scores in the septic patients). Using this inference, we examined four independent data sets that compared SIRS/trauma/surgery patients either to the same patients later in their hospital course at onset of sepsis or to a mixed cohort of patients with community- and hospital-acquired sepsis. These data sets included whole blood [EMTAB1548 (21)], neutrophils [GSE5772 (29)], and PBMCs [GSE9960 (8); EMEXP3621 (40)] (Table 7). In each of these four data sets, the infection z score separated late-acquired infections from admission SIRS or trauma, with ROC AUCs ranging from 0.48 to 0.76 in PBMCs to 0.86 in whole blood (fig. S7). We emphasize that these AUCs are expected to be lower due to their time-mismatched comparison and are essentially the lower limits of what properly time-matched infection z scores would be in each of these cell compartments.

Table 7. Publicly available gene expression data sets comparing sterile SIRS/trauma patients to later or non–time-matched sepsis/infection patients.
View this table:

Finally, we examined our 11-gene set in one data set comparing healthy controls or those with autoimmune inflammation to acute bacterial infections after diagnosis confirmation (GSE22098, n = 274) (33). Exact sampling times are not available, but typically, confirmation of infection takes 24 to 72 hours, so these infection samples are expected to show lower z scores than at the time of diagnosis. Still, the infection z score was able to discriminate healthy and autoimmune inflammation patients from those with acute infections (ROC AUC, 0.72; fig. S8). Considering that cohorts with autoimmune inflammation were not included in our discovery set, this provides validation of the specificity of the infection z score for infectious inflammation.

The effect of infection type on infection z score

To examine whether there were any infection type–specific differences in the infection z score, we compared patients infected with Gram-positive versus Gram-negative bacteria, as well as those with viral infections to those with bacterial infections. The Glue Grant patients were not analyzed, because there were too few time-matched infection patients in each subcohort. Four data sets had information on Gram-positive versus Gram-negative infection, and four had data on bacterial versus viral infections; in neither case was there a clear trend of differences in infection z score based on infection subtype (table S6).

Gene set pathway evaluation and transcription factor analysis

Having validated the 11-gene set, we examined whether any mechanism might explain why these genes were acting in concert. We analyzed the 11-gene set with Ingenuity Pathway Analysis, which showed that several of the genes are downstream of IL-6 and JUN (fig. S9). All 11 genes identified by the multicohort analysis were tested with both EncodeQT and PASTAA (chosen for a mix of experimental results and in silico transcription factor predictions). EncodeQT found only one significant transcription factor among the positive genes (MAX) and none for the negative genes (EncodeQT Q ≤ 0.01, table S7A). PASTAA showed enrichment for well-known proinflammatory transcription factors, such as nuclear factor κB (NF-κB) member c-REL, STAT5, and interferon response factors (IRF) 1 and 10 (table S7B).

Because we did not find an obvious network driver, we next studied whether the genes were enriched in certain immune cell types that might explain their relation to sepsis. We searched GEO for human immune cell type–specific gene expression profiles and found 277 samples from 18 data sets matching our criteria (table S8). We aggregated these into broad immune cell type signatures using mean gene expression scores. We then calculated standardized enrichment scores using the same method as the infection z score (difference of geometric means between positive and negative genes). We did this both for the initial set of 82 genes found to be significantly enriched in the multicohort analysis and for the 11-gene set found after forward search (the genes included in the infection z score) (Fig. 7). The set of all 82 significant genes was found to be highly enriched in band cells only (>4 SDs above the mean; P < 1 × 10−6). The 11-gene set was significantly enriched (>2 SDs above the mean; P = 0.015) in band cells but also showed up-regulation in regulatory T cells (Tregs) and down-regulation in dendritic cells. This suggests that one driving force in differential gene expression between sterile SIRS and sepsis is the presence of band cells; however, the best set of genes for diagnosis contains information that may incorporate multiple cell type shifts at once. Finally, we checked whether there was a difference in band counts (where present) between acutely infected and noninfected patients in the Glue Grant trauma cohort, but there was no significant difference (mean noninfected = 2.13; mean infected = 2.74; P = 0.49).

Fig. 7. Cell type enrichment analyses.

(A and B) Standardized enrichment scores (z scores, dots) for human immune cell types for both (A) the entire set of 82 genes found to be significant in multicohort analysis and (B) the 11-gene set found after forward search (subset of the 82 genes). Part (B) also shows a box plot of distributions of z scores.


The dozens of studies that we examined here have reported valuable insights into changes in gene expression that occur in response to SIRS, trauma, surgery, and sepsis; one key insight is that time after injury is an important factor in gene expression after injury (6, 12, 14, 35). Across multiple independent cohorts, we showed that changes in gene expression over time during recovery are nonlinear but follow a similar trajectory (Fig. 2). Therefore, a comparison of gene expression across early and late time points in the same patient will yield a large number of differentially expressed genes solely due to the recovery process. It is thus very difficult to identify relatively small changes in gene expression due to late complication such as infection from the large changes caused by recovery. Therefore, we separated longitudinal studies into subcohorts of patients at matched time points. We used an integrated, time-course–based multicohort analysis (41, 42) to evaluate differential gene expression between sterile SIRS/trauma and sepsis/infection patients. We then used a forward search to select a parsimonious set of differentially expressed genes optimized for discriminatory power for sepsis. An infection z score, derived from the geometric mean of the 11-gene set, had a mean ROC AUC of 0.87 in the discovery cohorts for distinguishing SIRS/trauma from sepsis/infection patients.

We validated the 11-gene set in an independent group of patients from the Glue Grant. The mean AUC for distinguishing sepsis from noninfectious inflammation was 0.83 in the neutrophils validation cohort, with a trend toward better diagnostic power with greater time since initial injury, when initial traumatic inflammation wanes and hospital-acquired infections manifest (43). Although we expect the whole-blood transcriptional profiles to be largely driven by neutrophils, the signal in sorted cells will certainly differ from whole blood. Despite this limitation, the infection z scores performed comparably in validation cohorts. We further validated the infection z score in several additional external data sets, which included three longitudinal cohorts of ICU/trauma patients who developed VAP/VAT; eight cohorts of healthy controls compared to patients with bacterial or viral sepsis; four cohorts of admission SIRS/trauma patients compared to patients at mixed or later time points using whole blood, neutrophils, and PBMCs; and one cohort of patients with autoimmune inflammation compared to patients with acute infection. Finally, we showed that the infection z score does not have systematic trends with regard to infection type (Gram-positive versus Gram-negative and bacterial versus viral) across those data sets for which infection type information is available.

Using the extensive clinical phenotype data available for patients in the Glue Grant, we illustrated two important points about the application of the infection z score. First, the infection z score showed a decline over time since injury that was similar in both infected and noninfected patients. Thus, for maximal discriminatory power, if the infection z score were to be tested in a longitudinal study, the diagnostic thresholds would need to be a function of the time since initial injury/event. Second, the infection z scores increased over the days before infection, peaked within 1 day of diagnosis, and decreased afterwards (presumably due to treatment of infection). This observation raises the possibility that earlier diagnosis or stratification of patients at risk of developing sepsis may be possible using the 11-gene set. In particular, we note that the early rise in infection z score that precedes a clinical diagnosis of infection is not a false positive but an “early positive” result.

In the Glue Grant buffy coat cohort, SIRS binary parameters alone performed poorly in discriminating patients at time of infection from noninfected patients. SIRS criteria plus the infection z score with a global cutoff (that is, not broken into separate time bins) increased the discriminatory power with a continuous NRI of 0.9. However, SIRS is only one of several criteria used to diagnose sepsis. Procalcitonin is a well-studied biomarker for differentiating sepsis from SIRS, with a summary ROC AUC of 0.78 (range, 0.66 to 0.90) (5). The average AUC in our discovery cohorts was 0.87 and the time-matched neutrophils validation cohort had a mean AUC of 0.83, both of which are thus at least comparable to procalcitonin. None of the publicly available data sets included procalcitonin levels, so no direct comparison is available. We emphasize, however, that each of these markers need not be used separately; any prospective study of the infection z score should also include known biomarkers to test for better diagnostic performance using biomarker combinations and for head-to-head comparisons.

Both infectious and noninfectious inflammation can lead to SIRS through activation of the same innate immune pathways [TLRs, RIG-like receptor (RLRs), NLRs, etc.], so the “typical” proinflammatory genes and cytokines (such as tumor necrosis factor and the interleukins) are generally expressed in both sterile and infectious inflammation (45). For instance, one recent study showed high correlation in gene expression between sterile inflammation (Glue Grant burns cohort) and four independent sepsis data sets, with as much as 93% of the genes changing in the same direction in the two conditions (12). Thus, a standard hypothesis-driven approach in the search of biomarkers specifically differentially expressed between sterile SIRS and sepsis is unlikely to succeed, given that the “standard” suite of cytokines and chemokines known to be expressed in sepsis is mostly also activated in sterile SIRS. However, several protein families have been shown to have specificity for pathogen-associated molecular patterns, thus giving rise to the possibility of infection-specific innate immune signaling pathways (11). Our data-driven, unbiased approach searched specifically for genes that are homogeneously statistically differentially expressed between sterile SIRS/trauma patients and sepsis patients across multiple cohorts.

Some of the genes in the sepsis-specific 11-gene set, such as CEACAM1, C3AR1, GNA15, and HLA-DPB1, have been previously associated with sepsis or infections (46, 47). The regulatory control of these genes may be enriched for several proinflammatory factors, but no single common factor explained the network. The gene sets found here may be better explained by cell type enrichment analyses. We show that band cells and the myeloid cell line are highly enriched for the gene sets found to be significantly differentially expressed between sterile SIRS and sepsis. The finding of enrichment in band cells is particularly intriguing, because bands have previously been shown to help differentiate sterile SIRS and sepsis (48). However, there is very high variability in band counts both by automatic blood counters and by hand (49), and no good serum marker exists. The 11-gene set may distinguish sepsis from sterile SIRS at least in part because it also includes information on increased Tregs and decreased dendritic cells, both of which have previously been implicated in sepsis (50, 51). In particular, the joint findings that the 11-gene set is overexpressed in bands but underexpressed in adaptive immune cells are remarkably similar to the phenotype of increased immature granulocytes and decreased adaptive immunity caused by myeloid-derived suppressor cells in infection and chronic critical illness (52, 53). The connection between the 11-gene set and different immune cell types may help explain some sepsis biology, but certainly, these 11 genes require further study.

Our study has some limitations. First, although we validated the 11-gene set in all available independent data sets, prospective validation is required. Second, the Glue Grant buffy coat and neutrophils cohorts were incorporated in a way that treated different periods of time since injury as different data sets, though the noninfected controls came from the same patient cohort at different sampling times (with some dropouts due to injury, recovery, or missed sampling). These time-based control subsets are thus not independent of one another, which may lead to underestimation of effect size variance; however, this was only the case for the two Glue Grant data sets and not the rest of the data sets.

This work presents several future directions. First, both the 11-gene set and the protein products of these genes will need to be tested prospectively in a time-matched manner. Although protein assays are faster than transcript quantitation assays, a number of advances in polymerase chain reaction technology have brought assay times down toward the range of clinical applicability (54). Second, our results showed that the changes in gene expression due to normal recovery from a traumatic event (such as injury or surgery) mean that time must be properly accounted for in any gene expression study of acute illness. Our search found several studies that examine time course after SIRS/trauma (GSE6377, GSE12838, GSE40012, and EMEXP3001) and several that examine the time course since onset of sepsis/infection (GSE20346, GSE2713, GSE40012, and EMEXP3850). However, we found only one publicly available microarray study (the Glue Grant) that examined a cohort of patients over time, where some of the cohorts develop infection and some do not. Thus, on the basis of our results, we recommend that future studies of sepsis diagnostics should be designed with longitudinal cohorts both with and without infection to enable appropriate time-matched comparisons (9, 10).

Overall, our comprehensive analysis of publicly available gene expression data in SIRS/trauma and sepsis has yielded a parsimonious 11-gene set with excellent discriminatory power in both the discovery cohorts and in 15 independent cohorts. Optimizing a clinical assay for this gene set to get results within a window of clinical relevance should be feasible. Further study will be needed both to confirm our clinical findings in a prospective manner and to investigate the molecular pathways upstream of these genes.


Materials and Methods

Fig. S1. Labeled PCA comparing healthy controls, SIRS/trauma patients, and sepsis patients.

Fig. S2. Violin plots for the data sets included in the discovery multicohort analysis.

Fig. S3. Neutrophil percentages for the Glue Grant patients with both complete blood count and microarray data.

Fig. S4. Performance of the infection z score in the sorted monocytes from the Glue Grant cohort.

Fig. S5. Performance of the infection z score in the sorted T cells from the Glue Grant cohort.

Fig. S6. Linear models of SIRS criteria and the infection z score.

Fig. S7. The infection z score in non–time-matched data sets.

Fig. S8. Comparison of the infection z scores in patients with acute infections to healthy controls and patients with autoimmune diseases.

Fig. S9. Ingenuity Pathway Analysis results for the 11-gene set.

Fig. S10. Schematic of the entire integrated multicohort analysis.

Table S1. Summary spreadsheet of all data sets referenced in the manuscript.

Table S2. Design matrix of individual phenotypes for multicohort analysis.

Table S3. Summary statistics for the 82 genes that passed significance, heterogeneity, and effect size filtering after multicohort analysis.

Table S4. Probe-level data for all 11 genes in the diagnostic set for all patients in the multicohort analysis.

Table S5. Linear models of infection score in the Glue Grant data.

Table S6. Comparison of infection z score across infection types.

Table S7. In silico transcription factor binding analyses for the 11-gene set.

Table S8. Design matrix for cell type enrichment analyses.

Movie S1. Rotation of a time-course–labeled PCA of trauma patients.

References (5579)


  1. Acknowledgments: We would like to thank T. Roth for the help in curating the public immune cell profiles; T. Chakraborty, J. P. Cobb, H. Hossain, M. Lissauer, G. Parnell, and B. Tang for helpful discussions concerning their publicly available data; P. Mason for assistance with accessing the Glue Grant data; and the Glue Grant investigators for sharing their data publicly; they are supported in this by National Institute of General Medical Sciences Glue Grant Legacy Award R24GM102656. Funding: T.E.S is funded by National Library of Medicine grant 2T15LM007033, a Stanford Child Health Research Institute Young Investigator Award (through the Institute for Immunity, Transplantation and Infection), and the Stanford Department of Surgery. P.K. is funded by National Institute of Allergy and Infectious Diseases grants 1U19AI109662, U19AI057229, U54I117925, and U01AI089859. Author contributions: Study conception and design: T.E.S. and P.K.; contributed materials and methods: A.S. and H.R.W.; performed the analysis: T.E.S. and P.K.; drafted manuscript: T.E.S. and P.K.; and critical revision: T.E.S., A.S., H.R.W., and P.K. Competing interests: The 11-gene set has been disclosed for possible patent protection to Stanford Office of Technology and Licensing by T.E.S. and P.K. Data and materials availability: Links to the data sets used here are available in table S1; patient design matrix and probe-level data are available in tables S2 and S3. The GPSSSI unique data set has been deposited in GEO at accession number GSE66099. The Glue Grant data are publicly available, pending Institutional Review Board approval per the instructions found at The data and code necessary to recreate the multicohort analysis have been deposited online at; access to the data will be granted after approval by the Glue Grant consortium.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article