Research ArticleSepsis

Robust classification of bacterial and viral infections via integrated host gene expression diagnostics

See allHide authors and affiliations

Science Translational Medicine  06 Jul 2016:
Vol. 8, Issue 346, pp. 346ra91
DOI: 10.1126/scitranslmed.aaf7165

Making a gene list, checking it twice

Sepsis, a severe inflammation caused by infection, is a common and deadly medical condition. Sepsis therapy combines supportive treatment with interventions directed at the underlying cause of the illness, especially antibiotics for bacterial infections. Unfortunately, it can be difficult to distinguish patients with noninfectious inflammation from those with bacterial and viral infections, and only those with bacterial sepsis derive any benefit from antibiotics. Sweeney et al. have previously analyzed large numbers of patients across many cohorts to derive a blood test identifying which patients with sepsis-like symptoms have an underlying infection. Now, the authors expanded their analysis to create an integrated score that not only identifies infected patients but also classifies their infection as bacterial or viral, suggesting appropriate treatment.


Improved diagnostics for acute infections could decrease morbidity and mortality by increasing early antibiotics for patients with bacterial infections and reducing unnecessary antibiotics for patients without bacterial infections. Several groups have used gene expression microarrays to build classifiers for acute infections, but these have been hampered by the size of the gene sets, use of overfit models, or lack of independent validation. We used multicohort analysis to derive a set of seven genes for robust discrimination of bacterial and viral infections, which we then validated in 30 independent cohorts. We next used our previously published 11-gene Sepsis MetaScore together with the new bacterial/viral classifier to build an integrated antibiotics decision model. In a pooled analysis of 1057 samples from 20 cohorts (excluding infants), the integrated antibiotics decision model had a sensitivity and specificity for bacterial infections of 94.0 and 59.8%, respectively (negative likelihood ratio, 0.10). Prospective clinical validation will be needed before these findings are implemented for patient care.


Early and accurate diagnosis of infection is key to improving patient outcomes and reducing antibiotic resistance. The mortality rate of bacterial sepsis increases by 8% for each hour by which antibiotics are delayed (1); however, indiscriminate prescription of antibiotics to patients without bacterial infections increases rates of morbidity and antimicrobial resistance. The rate of inappropriate antibiotic prescriptions in the hospital setting is estimated at 30 to 50% and would be decreased by improved diagnostics (2, 3). In broader use, up to 95% of outpatients given antibiotics for suspected enteric fever have negative cultures (4). There is currently no gold standard point-of-care diagnostic that can broadly determine the presence and type of infection. Thus, the White House has established the National Action Plan for Combating Antibiotic-Resistant Bacteria, which called for “point-of-need diagnostic tests to distinguish rapidly between bacterial and viral infections” (5).

Although new polymerase chain reaction (PCR)–based molecular diagnostics can profile pathogens directly from a blood culture (6), such methods rely on the presence of adequate numbers of pathogens in the blood. Moreover, they are limited to detecting a discrete range of pathogens. As a result, there is a growing need for molecular diagnostics that profile the host gene response. These include diagnostics that can distinguish the presence of infection as compared to inflammation in the absence of infection, such as our 11-gene “Sepsis MetaScore” (SMS) (7), which has been validated across multiple cohorts (8), among others (9, 10). Other groups have focused on gene sets that can distinguish between types of infections, such as bacterial versus viral infections (1113); however, these gene sets often contain too many genes to translate into a useful clinical tool. Tsalik et al. described a model that distinguishes among all three groups—noninfected patients and those with bacterial or viral illness—but this model required the measurement of 122 probes, presenting an implementation challenge (14). Similarly, we have described a “meta-virus signature” that describes a common response to viral infection but contained too many genes (396) for clinical application (15). Overall, although great promise has been shown in this field, no pragmatic infection diagnostic based on host gene expression has yet made it into clinical practice.

The data from these biomarker studies and dozens of other genome-wide expression studies in sepsis and acute infections have been published and deposited for further study in public databases such as the National Institutes of Health (NIH) Gene Expression Omnibus (GEO) and the European Bioinformatics Institute (EBI) ArrayExpress. These data are a largely untapped resource that can be used for both biomarker discovery and validation. We have previously shown that our integrated multicohort analysis of gene expression produces robust diagnostic tools for organ transplant (16), sepsis (7), specific types of viral infections (15), and active tuberculosis (17). Furthermore, these data are also useful as a benchmarking and validation tool for new host gene expression diagnostics. However, such validation using public data has previously been limited to only those cohorts that contain at least two classes of interest (those in which a direct comparison between classes is possible), because interstudy technical differences preclude direct comparison of diagnostic scores between cohorts.

Here, we sought to improve the diagnostic power of the SMS by adding the ability to discriminate bacterial from viral infections. Thus, to derive an improved biomarker for discriminating infection types, we applied our multicohort analysis framework to clinical microarray cohorts that compared the host response to bacterial and viral infections. We further developed a method to conormalize gene expression data among multiple cohorts, allowing direct comparison of a diagnostic score among multiple cohorts. Finally, we combined the previous SMS and the bacterial/viral diagnostic described here into an integrated antibiotics decision model (IADM) that can determine whether a patient with acute inflammation has an underlying bacterial infection.


Derivation of the seven-gene bacterial/viral metascore

Our previously published 11-gene SMS cannot reliably distinguish between bacterial and viral infections, showing mostly nonsignificant differences in score distribution between patients with bacterial and viral infections (fig. S1). Having previously shown that there is a conserved host gene response to viral infections (15), we hypothesized that a classifier for bacterial versus viral infections would allow for an improved diagnostic model. We thus performed a systematic search for gene expression microarray cohorts that studied patients with viral and/or bacterial infections. We identified eight cohorts (11, 1826) [both whole blood and peripheral blood mononuclear cells (PBMCs)] that included n > 5 patients with both viral and bacterial infections (Table 1A). The eight cohorts were composed of 426 patient samples (142 viral and 284 bacterial infections), including children and adults, medical and surgical patients, and those with multiple sites of infection. We performed multicohort analysis on the eight cohorts as described previously (fig. S2) (7, 1517). We set significance thresholds at an effect size >2-fold and a false discovery rate (FDR) <1% in leave–one–data set–out round-robin analysis. However, to make sure that neither tissue type (whole blood or PBMCs) biased our results, we further selected only those genes that also had an effect size >1.5-fold in separate analyses of both PBMC and whole-blood cohorts. This process resulted in 72 differentially expressed genes significant at the above thresholds (table S1). We used a greedy forward search (7) to find a gene set optimized for diagnosis, resulting in seven genes [higher in viral infections (IFI27, JUP, and LAX1) and higher in bacterial infections (HK3, TNIP1, GPAA1, and CTSB); fig. S3)]. As expected, a “bacterial/viral metascore” based on these seven genes robustly distinguished viral from bacterial infections in all eight of the discovery cohorts [summary receiver operating characteristic (ROC) area under the curve (AUC), 0.97; 95% confidence interval (CI), 0.89 to 0.99; Fig. 1 and fig. S4].

Table 1. Data sets used in the discovery and direct validation of the bacterial/viral metascore.

CAP, community-acquired pneumonia; HHV6, human herpesvirus 6; RSV, respiratory syncytial virus; HSV, herpes simplex virus; CMV, cytomegalovirus; MPV, metapneumovirus; PICU, pediatric intensive care unit; LRTI, lower respiratory tract infection; ARI, acute respiratory infection; COPD, chronic obstructive pulmonary disease.

View this table:
Table 2. Validation data sets that matched inclusion criteria and have a single known pathogen type (viral or bacterial).

DHF, dengue hemorrhagic fever; DSS, dengue shock syndrome.

View this table:
Fig. 1. Summary ROC curves for discovery and direct validation data sets for the bacterial/viral metascore.

Summary ROC curve is shown in black, with 95% CIs in dark gray.

We next tested the seven-gene set in the six remaining independent clinical cohorts (13, 14, 2729) that directly compared bacterial and viral infections (138 bacterial and 203 viral infections, totaling 341 samples) and found a summary ROC AUC of 0.91 (95% CI, 0.82 to 0.96) (Fig. 1, Table 1B, and fig. S5; individual test characteristics in table S2). To measure the generalizability of our signature, we also tested whether cells stimulated in vitro with lipopolysaccharide (LPS) or influenza virus could be separated with the bacterial/viral metascore [GSE53166 (30), n = 75; AUC, 0.99; fig. S6].

Global validation via COCONUT conormalization

There are dozens of microarray cohorts in the public domain that studied either bacterial or viral infections, but not both, thus precluding a direct (within data set) estimate of diagnostic power for separating bacterial and viral illness. To apply and compare a gene score across these cohorts, we needed a method that could remove inter–data set batch effects while remaining unbiased to the diagnosis of the diseased patients. We designed and implemented a modified type of array normalization that uses the ComBat (31) empirical Bayes normalization methods on healthy controls to obtain bias-free corrections of disease samples (a method we call COmbat CO-Normalization Using conTrols or “COCONUT”; fig. S7). Housekeeping genes remained invariant across both diseases and cohorts after COCONUT conormalization, and each gene still retained the same distribution between diseases and controls within each data set (fig. S8). Because our method assumes that all healthy samples are derived from the same distribution, we separately normalized the whole-blood and PBMC samples because different immune cell types have different baseline gene expression distributions. Using COCONUT conormalization, the bacterial/viral metascore has a global AUC of 0.92 (95% CI, 0.89 to 0.96) in the whole-blood discovery cohorts (figs. S9 to S10). We then applied this method to test the bacterial/viral metascore in all public domain microarray cohorts that matched inclusion criteria and used whole blood. These data sets included the four direct validation cohorts that included control patients and an additional 20 cohorts that measured either bacterial or viral infections but not both (n = 143 + 897 = 1040) (3249). These data sets represent a wide variety of clinical conditions, including a range of infection types (Gram-positive, Gram-negative, atypical bacterial, common respiratory viruses, and dengue) and severities (mild infections to septic shock). The bacterial/viral metascore showed an overall ROC AUC of 0.93 (95% CI, 0.91 to 0.94) across these data, which allowed us to set a single global score cutoff (Fig. 2, Table 1B, and fig. S11). Finally, we performed the same procedure on PBMC validation cohorts [six cohorts (5054), n = 259; global AUC, 0.92 (95% CI, 0.87 to 0.97; figs. S12 to S13)]. All three global ROC AUCs using COCONUT conormalization (discovery whole blood, 0.92; validation whole blood, 0.93; validation PBMCs, 0.92) approximately matched the summary AUC of the direct validation cohorts (0.91), giving high confidence in the diagnostic power of this method.

Fig. 2. Bacterial/viral score in COCONUT-conormalized whole-blood validation data sets.

The global AUC across all whole-blood discovery data sets is 0.93. Top: Score distribution by data set (blue, bacterial; red, viral). Middle: Individual gene expression (exp.). Bottom: Housekeeping genes (grayscale). The dotted line at the top shows a possible global threshold for discriminating infection type.

Integrated antibiotics decision model

A key clinical need is diagnosing whether a patient with signs and symptoms of inflammation has an underlying rapid and judicious administration of antibiotics is key to improving patient outcomes. Neither the SMS nor the bacterial/viral metascore alone can robustly distinguish between all three classes of (i) noninfected inflammation, (ii) bacterial illness, and (iii) viral illness. Thus, to increase clinical relevance, we developed an “integrated antibiotics decision model” (IADM), whereby we first apply our previously described SMS (7) to test for the presence of an infection and then apply the bacterial/viral metascore to the samples that test positive for infection (Fig. 3A). As described previously, the only way to establish test characteristics for the IADM simultaneously across cohorts is to use COCONUT conormalization. We found that the SMS in COCONUT-conormalized data is strongly influenced by age, which could be caused by age-dependent differences in healthy subjects, infected patients, or both (fig. S14). We thus excluded cohorts focused on infants (children <1 year old) from the IADM. We also removed the low-severity outpatient viral illness cohorts (GSE17156 and GSE68310) because in outpatient settings, the history and physical exam findings make noninfectious causes of acute inflammation less likely. This resulted in a total of 20 cohorts for testing the IADM (n = 1057). The resulting global AUC for the SMS across the available data was 0.86 (95% CI, 0.84 to 0.89) (fig. S15 and table S3). We set global thresholds for an SMS sensitivity for infection of 95% and a bacterial/viral metascore sensitivity for bacterial infection of 95%. Considering all three classes of noninfectious inflammation, bacterial infection, and viral infection, this yielded an overall sensitivity and specificity of 94.0 and 59.8% for bacterial infections and 53.0 and 90.6% for viral infections, respectively (Fig. 3). These performance characteristics were largely unchanged if healthy patients were included in the noninfected class (fig. S16). The overall positive and negative likelihood ratios for bacterial infection in the IADM were thus 2.34 (LR+) and 0.10 (LR), respectively, compared to a recent meta-analysis of procalcitonin that showed a negative likelihood ratio of 0.29 (95% CI, 0.22 to 0.38) (55). We plotted negative predictive value (NPV) and positive predictive value (PPV) versus prevalence for these test characteristics; the NPV and PPV for bacterial infection at a prevalence of 15% were 98.3 and 29.2%, respectively (fig. S17).

Fig. 3. IADM across COCONUT-conormalized public gene expression data that matched inclusion criteria.

(A) IADM schematic. (B) Distribution of scores and cutoffs for IADM in COCONUT-conormalized data. SIRS, systemic inflammatory response syndrome. (C) Confusion matrix for diagnosis. Bacterial infection sensitivity, 94.0%; bacterial infection specificity, 59.8%; viral infection sensitivity, 53.0%; viral infection specificity, 90.6%.

There was only one data set [GSE63990 (14)] that included noninfectious inflammation patients and patients with both bacterial and viral illness but did not include healthy controls, precluding its addition to the global calculations. We thus tested the IADM on this cohort with locally derived test thresholds. We found an overall bacterial infection sensitivity and specificity of 94.3 and 52.2%, respectively (fig. S18).

Validation in independent samples using NanoString nCounter

Finally, we used targeted quantitative NanoString nCounter (56) gene expression assays to prospectively validate these results in independent whole-blood samples from children with sepsis from the Genomics of Pediatric SIRS and Septic Shock Investigators (GPSSSI) cohort (total n = 96, with 36 SIRS, 49 bacterial sepsis, and 11 viral sepsis patients; Fig. 4 and table S4). The GPSSSI cohort was also used by data set GSE66099, but the children profiled here were never profiled via microarray and are thus not part of the discovery data sets. In the NanoString validation cohort, the SMS AUC was 0.81 (AUC of 0.80 in GSE66099). Similarly, the bacterial/viral metascore AUC was 0.84 (AUC of 0.83 in GSE66099). The microarray AUCs were thus preserved when tested with a targeted, quantitative gene expression assay in new patients. Applying the same IADM, the sensitivity and specificity were 89.7 and 70.0% for bacterial infections (LR, 0.15; LR+, 3.0) and 54.5 and 96.5% for viral infections (LR, 0.47; LR+, 15.6), respectively.

Fig. 4. Targeted NanoString gene expression data for children with SIRS/sepsis from the GPSSSI cohort never tested with microarrays.

Total n = 96, of which SIRS = 36, bacterial sepsis = 49, and viral sepsis = 11. (A) Breakdown of infected patients by organism type. (B and C) ROC curves for the SMS and the bacterial/viral metascore. (D) Distribution of scores and cutoffs for IADM. (E) Confusion matrix for IADM. Bacterial infection sensitivity, 89.7%; bacterial infection specificity, 70.0%; viral infection sensitivity, 54.5%; viral infection specificity, 96.5%.


Better diagnostics for acute infections are needed in both the inpatient and outpatient settings. In low-acuity outpatient settings, a simple diagnostic that can discriminate bacterial from viral infections may be enough to assist in appropriate antibiotic usage. In higher-acuity settings, causes of noninfectious inflammation become more important to rule out, and so a decision model for antibiotic prescriptions must include a noninfected, nonhealthy case. Thus, a reliable diagnostic needs to distinguish all three cases (noninfected inflammation, bacterial infection, and viral infection). Here, using 426 samples from 8 cohorts, we derived a parsimonious set of only seven genes that can accurately discriminate bacterial from viral infections across a very broad range of clinical conditions in independent cohorts (total of 30 cohorts composed of 1299 patients). We further demonstrated that by integrating our published SMS (7) (to distinguish the presence or absence of infection) with the bacterial/viral metascore (to determine infection type) into a single IADM, we can determine with high accuracy which patients do not require antibiotics. Finally, we confirmed the diagnostic power of both the seven-gene set and the IADM in independent samples using a targeted NanoString assay, showing that the signatures retain diagnostic power when not relying on microarrays.

The IADM has a low negative likelihood ratio (0.10) and high estimated NPV, meaning it would be potentially effective as a rule-out test. A meta-analysis of procalcitonin that included 3244 patients from 30 studies resulted in an overall estimated negative likelihood ratio of 0.29 (95% CI, 0.22 to 0.38) (55). Thus, the IADM negative likelihood ratio is much lower than the estimate for procalcitonin (on the basis of nonoverlapping 95% CIs), indicating clinical utility. Moreover, these test characteristics assume no knowledge of the patient and so are only estimates of the real-world clinical utility of such a test, because patient history, physical examination, vital signs, and laboratory values would all assist in a diagnosis as well. Even given these caveats, a recent economic decision model of screening ICU patients for hospital-acquired infections suggested that a test such as the IADM that can accurately diagnose bacterial and viral infections could be cost-effective (57). Ultimately, only interventional trials will be able to establish cost-effectiveness and clinical utility of a diagnostic test.

We validated our diagnostic in pediatric sepsis patients from the GPSSSI cohort using a NanoString assay. NanoString is highly accurate and is a useful tool for measuring the expression of multiple genes at once; however, it is also likely too slow for clinical application (4 to 6 hours per assay). Thus, although the assay confirms that our gene set is robust in targeted measurements, further work will be needed to improve the turnaround time. There are multiple possibilities for developing a commercial assay based on rapid multiplexed quantitative PCR that meets the time-sensitive demands of an infection diagnostic test. However, this technical hurdle is something that all gene expression–based infection diagnostics must overcome to gain clinical relevance.

A simple linear score generally cannot adequately separate the three classes (noninfected inflammation, bacterial infections, and viral infections); other machine-learning techniques, such as multiple regression or tree-based methods, are typically used. However, because the location and scale of different genes vary greatly between microarray types, such techniques cannot usually be truly validated across microarray platforms. Although COCONUT partially circumvents this issue by allowing for discovery of a model across several cohorts with application of the same model in COCONUT-conormalized validation cohorts, such method would be forced to leave out data sets that did not include healthy controls. We thus opted to use our simple, scale-free, difference-of-geometric-means scores. This allowed us to both discover and validate our diagnostic models across multiple cohorts. It may be possible to discover a gene set using multiple classification techniques, instead of our current method of starting with the SMS and then adding the bacterial/viral metascore. However, to discover and then validate in multiple cohorts requires that all cohorts be appropriate for COCONUT normalization, and we have instead erred on the side of maximizing the utility of multiple clinically heterogeneous cohorts.

Several groups have published models for diagnosing infections on the basis of host gene expression; none have yet made it into clinical practice. Most of these classifiers were not tested in multiple independent cohorts, had too many genes to allow rapid profiling necessary for useful diagnosis, or both. For instance, Suarez et al. created a 10-gene k-nearest-neighbor classifier but did not test it outside their published data set (GSE60244) (13). Tsalik et al. created a 122-probe (120-gene) classifier on the basis of multiple regression models, but in testing it in external GEO cohorts, they retrained their regression coefficients in each new data set (14). Such model retraining results in a strong upward bias to these validation numbers (assuming that a final model would not be locally retrained). Other groups have made gene expression classifiers for infection but did not include models for discriminating viral infections (7, 9, 10). Our IADM is robust across a wide range of disease types and severities but has a relatively lower sensitivity for viral infections. Non–gene expression biomarkers have also been used for infection diagnosis. Procalcitonin has been studied extensively in the setting of sepsis diagnosis but cannot distinguish between noninfected individuals and those with viral infections (58). Protein-panel assays have been shown to discriminate bacterial from viral infections but cannot discriminate patients with noninfectious inflammation (59, 60). Thus, all of these classifiers have certain strengths and weaknesses that will become more apparent with further prospective testing and direct comparison.

Although our goal in this study was to identify biomarkers and not necessarily mechanistic biology, it is still important for a biomarker set to have biological plausibility. Of the seven genes in the bacterial/viral metascore, six have previously been linked to infections or leukocyte activation. Both IFI27 and JUP were shown in single-cohort genome-wide expression studies to be induced in response to viral infection (52, 61), whereas TNIP1 and CTSB are important in modulating the nuclear factor kB and necrotic responses to bacterial infection (62, 63). Finally, LAX1 (up-regulated in viral infections) is involved in activation of T and B cells (64), and HK3 is instrumental in the neutrophil differentiation pathway (65). Thus, the role of these transcripts as biomarkers for infection type is not coincidental.

Here, we developed a method, COCONUT, to directly compare our model across a large pool of one-class cohorts that would otherwise be unusable for benchmarking a diagnostic gene set. COCONUT assumes that all controls come from the same distribution; that is, the genes in each group of controls are reset to have the same mean and variance, with batch parameters learned empirically from gene groups. This method corrects for microarray and batch processing differences between cohorts and thus allows for the creation of a global ROC curve with a single threshold. This is a more “real-world” measure of diagnostic power than reporting multiple validation ROC curves, because no single cutoff could attain the same test characteristics in the different cohorts (17). However, the COCONUT-conormalized data showed differences in infants as compared to older children and adults. Therefore, infants were excluded from the validation, meaning that the efficacy of the IADM in infants remains unknown. These data also had few children with viral infections between the toddler and teenage years; this is an age group that will require further study. The most important takeaway from the COCONUT-conormalized data is that both the bacterial/viral metascore and the IADM retain diagnostic power across a very broad range of infection types and severities, with overall AUCs that are similar to the summary AUCs from head-to-head comparisons within cohorts.

Overall, we have leveraged our proven multicohort analysis pipeline to derive a highly robust model for improving infection diagnosis. Using our method, we were able to validate this in dozens of independent microarray cohorts. We have also validated using a targeted NanoString assay in pediatric sepsis patients. Although the IADM still needs to undergo optimization for rapid turnaround as well as a prospective interventional trial, it seems clear that molecular profiling of the host genome will become part of the clinical toolkit in the future.


Study design

The purpose of this study was to use an integrated multicohort analysis framework to analyze multiple gene expression data sets to identify a biomarker that can classify patients with bacterial or viral infections. This framework has been described previously (7, 16, 17).

Systematic search and multicohort analysis

We performed a systematic search in NIH GEO and EBI ArrayExpress for public human microarray genome-wide expression studies using the following search terms: bact[wildcard], vir[wildcard], infection, sepsis, SIRS, ICU, nosocomial, fever, and pneumonia. Abstracts were screened to remove all studies that were (i) nonclinical, (ii) performed using tissues other than whole blood or PBMCs, or (iii) comparing patients who were not matched for clinical time.

In all data sets that included sample-level microbiology data, we retained only those samples for which pathogens of a single type (either bacterial or viral) had been identified. Data sets for which sample-level microbiology data were not available were still retained if the corresponding paper described the cohort as being infected with only bacteria or only viruses. In three cohorts, the diagnosis was not necessarily microbiologically confirmed: GSE11755 (one of six patients described as culture-negative meningitis), GSE42834 (described as a cohort of bacterial pneumonia meeting clinical criteria), and GSE57065 (described as a cohort of bacterial sepsis for which 86% of patients had microbiological confirmation). All other samples used in our analysis had confirmed microbiological diagnoses at the sample or cohort level.

All microarray data were renormalized from raw data (when available) using standardized methods. Affymetrix arrays were renormalized using GC robust multiarray average (gcRMA) (on arrays with mismatch probes) or RMA. Illumina, Agilent, GE, and other commercial arrays were renormalized via normal-exponential background correction followed by quantile normalization. Custom arrays were not renormalized. Data were log2-transformed, and a fixed-effect model was used to summarize probes to genes within each study. Within each study, cohorts assayed with different microarray types were treated as independent.

We performed multicohort meta-analysis as described previously (7, 1517). Briefly, genes were summarized using Hedges’ g, and the DerSimonian-Laird random-effects model was used for meta-analysis, followed by Benjamini-Hochberg multiple hypothesis correction (66). Patients with bacterial infections were compared to patients with viral infections within studies, such that a positive effect size indicates that a gene was more highly expressed in virus-infected patients and a negative effect size indicates that a gene was more highly expressed in bacteria-infected patients. All data sets that matched criteria with n > 5 in both bacterial and viral cohorts that were published by the time of the initial search (1 April 2015) were used for discovery; data sets published after this time were used in validation.

To find a set of genes highly conserved in differential expression between bacterial and viral infections, we selected all cohorts that directly compared patients with bacterial and viral infections. Patients with documented co-infections (both bacterial and viral) were removed. Cohorts were required to have >5 patients in each group to be included in the meta-analysis. Both PBMC and whole-blood cohorts were included. We only included genes present in 7 (k − 1) data sets; thus, there were 14,729 genes included in the analysis. Significant genes were those that had an effect size >2-fold and an FDR <1% in a leave–one–data set–out round-robin analysis. However, to ensure that genes from both whole blood and PBMCs were represented in the final gene set, we also performed separate meta-analyses of the PBMC and whole-blood cohorts and removed all genes that had an effect size <1.5-fold in either whole blood or PBMCs separately. The remaining genes were considered significant.

Derivation of the seven-gene set

To find a set of highly diagnostic genes, the significant genes from the meta-analysis were run through a greedy forward search as described previously (7). Briefly, this algorithm starts with zero gene and adds one gene in each cycle that best improves the AUC for diagnosis in the discovery cohorts, until a new gene cannot improve the discovery AUCs more than some threshold (an increase in the weighted discovery AUC of ≥1). The resulting genes are used to calculate a single bacterial/viral metascore, calculated as the geometric mean of the “viral” response genes minus the geometric mean of the “bacterial” response genes, times the ratio of the number of genes in each set. The resulting continuous score can then be tested for diagnostic power using ROC curves.

Direct validation of the seven-gene set

The resulting gene set was first validated in the remaining public gene expression cohorts that directly compared bacterial to viral infections but were too small to be used for meta-analysis. Two cohorts [GSE60244 (13) and GSE63990 (14)] were made public after our meta-analysis was completed and so were used for validation. To show generalizability, we also examined one large in vitro data set comparing LPS to influenza exposure in monocyte-derived dendritic cells, but this was not included in the summary AUC because it is not expected to come from the same distribution as the clinical studies.

Summary ROC curves

For both discovery and validation cohorts, summary ROC curves were constructed according to the method of Kester and Buntinx (67) and as previously described (17). Briefly, linear-exponential models were made for each ROC curve, and the parameters of these individual curves were summarized using a random-effects model to estimate the overall summary ROC curve parameters. The α-parameter controls AUC (in particular, distance of the line from the line of identity), and the β-parameter controls skewness of the ROC curve. Summary AUC CIs were estimated from the SE of the α and β in meta-analysis.

COCONUT conormalization

There are dozens of public microarray cohorts that profiled patients with either bacterial or viral infections, but not both. It would be advantageous to be able to compare a gene score across these cohorts, but it has not previously been possible because each different microarray has widely different background measurements for each gene, and among studies using the same types of microarrays, there are large batch effects. To make use of these data, we needed to conormalize these cohorts in such a way that (i) no bias is introduced that could influence final classification (the normalization protocol should be blind to diagnosis), (ii) there should be no change to the distribution of a gene within a study, and (iii) a gene should show the same distributions between studies after normalization. A method with these characteristics would allow our gene score to be calculated and compared across multiple studies and thus allow us to broadly test its generalizability.

The ComBat empirical Bayes normalization method (31) is popular for cross-platform normalization but crucially falls short of our desired criteria because it assumes an equal distribution across disease states. We thus developed a modified version of the ComBat method, which conormalizes control samples from different cohorts to allow for direct comparison of diseased samples from those same cohorts. We call this method COmbat CO-Normalization Using conTrols or COCONUT. COCONUT makes one strong assumption that it forces control/healthy patients from different cohorts to represent the same distribution. Briefly, all cohorts are split into healthy and diseased components. The healthy components undergo ComBat conormalization without covariates. The ComBat estimated parameters Embedded Image, Embedded Image, Embedded Image, δ*, and γ* are obtained for each data set for the healthy component and then applied to the diseased component (fig. S7). This forces the diseased components of all cohorts to be from the same background distribution but retains their relative distance from the healthy component (t statistics within data sets are only different post-COCONUT because of floating-point math). It also does not require any a priori knowledge of disease classification (such as bacterial or viral infection), thus meeting our prespecified criteria. This method does have the notable requirement that healthy/control patients are required to be present in a data set for it to be pooled with other available data. Also, because healthy/control patients are set to be in the same distribution, it should only be used where such an assumption is reasonable (such as within the same tissue type, among the same species, etc.).

The ComBat model and the COCONUT method

As described by Johnson et al., the ComBat model corrects for location and scale of each gene by first solving an ordinary least-squares model for gene expression and then shrinking the resulting parameters using an empirical Bayes estimator, solved iteratively (31). Formally, each gene expression level Yijg (for gene g for sample j in batch i) is assumed to be composed of overall gene expression αg, design matrix of sample conditions X with regression coefficients βg, additive and multiplicative batch effects γig and δig, and an error term εijgEmbedded Image

Estimating parameters using ordinary least-squares regression standardizes Yijg to a new term Zijg (where Embedded Image is the SD of εijg)Embedded Image

The standardized data are now distributed according to Embedded Image, where Embedded Image and Embedded Image.

The inverse γ is assumed as a standard uninformative Bayesian prior. The remaining hyperparameters are estimated empirically, with the derivation and solution found in the original reference (31). The estimated batch effects Embedded Image and Embedded Image can then be used to adjust the standardized data to an empirical Bayes batch-adjusted final output Embedded ImageEmbedded Image

In our modified version of this method (COCONUT), all of the above are performed according to the original method without modification. However, it is applied to only the healthy/control patients in each data set (Y is a matrix of only healthy patient samples). The estimated parameters Embedded Image, Embedded Image, Embedded Image, δ*, and γ* are all taken and applied directly to a matrix D that consists only of diseased patient sample (which must be ordered in the same manner as Y)Embedded ImageEmbedded Image

We can thus obtain a batch-corrected version of diseased samples D*, which corrects for the differences between healthy controls but does not change each submatrix Di with respect to each Yi.

Global ROCs

We used COCONUT conormalization to test (i) all discovery cohorts and (ii) all validation cohorts, even those containing only bacterial or only viral illness. We did this separately for the PBMC and whole-blood data, for reasons described previously. After conormalization, the distributions for the individual cohorts were plotted together to allow for direct comparison. For each plot, we show (i) the distribution of scores for each data set, (ii) the normalized gene expression for each gene within the diagnostic test, and (iii) the housekeeping genes that are expected to show no difference between classes based on meta-analysis. The healthy patients have been removed from these plots. However, to show that the distributions of genes between healthy and diseased patients within cohorts do not change after COCONUT conormalization, we have also shown plots with both patient types with both target genes and housekeeping genes (fig. S8). Genes with minimal effect size and minimal variance in meta-analysis were selected as housekeeping genes.

For each comparison, a single global ROC AUC was calculated, and a single threshold was set to allow for an estimate of the real-world diagnostic performance of the tests. Thresholds for the cutoffs for bacterial versus viral infection were set to approximate a sensitivity of 90% for bacterial infection, because a bacterial infection false-negative (the recommendation not to give antibiotics when antibiotics are needed) can be devastating.

Integrated antibiotics decision model

The SMS can discriminate between patients with severe acute infections and those with inflammation from other sources, but it cannot distinguish between types of infection (fig. S1). We thus tested an IADM, in which the 11-gene SMS is applied, followed by the 7-gene bacterial/viral metascore. This model thus identifies (i) whether a patient has an infection and, if so, (ii) what type of infection is present (bacterial or viral). We were unable to identify enough validation cohorts with patients with noninfected inflammation that also included healthy controls; thus, in constructing the global ROCs, we used both discovery and validation cohorts. Using COCONUT conormalization, we set global thresholds across all included cohorts, and these were applied to each individual data set to test the ability of the IADM to correctly distinguish patients with noninfectious inflammation, bacterial infection, and viral infection. Healthy patients were not included as a diagnostic class because they were used in the conormalization procedure. The IADM was also applied separately to all cohorts that had no healthy controls but that included (i) noninfected SIRS patients and (ii) patients with both bacterial and viral infections.

Because the PPV and NPV are dependent on prevalence and the prevalence of infections in the data used here does not match the prevalence of infections in a hospital setting, we calculated PPV and NPV curves on the basis of the sensitivity and specificity for bacterial infections attained with the IADM. Formally, NPV = specificity × (1 − prevalence)/((1 − sensitivity) × prevalence + specificity × (1 − prevalence)); PPV = sensitivity × prevalence/(sensitivity × prevalence + (1 − specificity) × (1 − prevalence)).

NanoString validation

We tested 96 samples from independent patients (those never profiled via microarray) from the GPSSSI trials (1822) using a targeted NanoString (56) digital multiplex gene quantitation assay. The 18 genes were renormalized to housekeeping genes (FRAS1 and LRRC17). The SMS and bacterial/viral metascore genes were both assayed, and the diagnostic performance of the IADM was calculated.

Data and source code availability

All analyses were conducted in the R statistical computing language (version 3.1.1). Code to recreate the multicohort meta-analysis, the COCONUT R package source code, and the COCONUT-normalized data used here have been deposited and are available at


Fig. S1. The SMS and pathogen type.

Fig. S2. Study schematic.

Fig. S3. Forest plots of the seven-gene set.

Fig. S4. Summary ROC forest plots for discovery data.

Fig. S5. Summary ROC forest plots for direct validation data.

Fig. S6. Bacterial/viral metascore ROC in GSE53166.

Fig. S7. Schematic of COCONUT conormalization.

Fig. S8. COCONUT conormalization of whole-blood discovery data sets.

Fig. S9. Bacterial/viral score in global ROC of non-conormalized whole-blood discovery data sets.

Fig. S10. Bacterial/viral score in global ROC of COCONUT-conormalized whole-blood discovery data sets.

Fig. S11. Bacterial/viral score in global ROC of non-conormalized whole-blood validation data sets.

Fig. S12. Bacterial/viral score in global ROC of non-conormalized PBMC validation data sets.

Fig. S13. Bacterial/viral score in global ROC of COCONUT-conormalized PBMC validation data sets.

Fig. S14. The effects of age on SMS in COCONUT-conormalized data.

Fig. S15. SMS across all COCONUT-conormalized whole-blood data (both discovery and validation).

Fig. S16. IADM across COCONUT-conormalized public gene expression data including healthy controls.

Fig. S17. NPV and PPV for the IADM.

Fig. S18. GSE63990, adults with ARIs.

Table S1. Significant gene list.

Table S2. Test characteristics of the bacterial/viral metascore in direct validation data sets.

Table S3. Data sets with noninfected inflammatory conditions used to test the IADM.

Table S4. NanoString data.



  1. Acknowledgments: We thank the patients who contributed clinical samples to the studies herein and the researchers who gathered, analyzed, and shared their data (see Supplementary Acknowledgments). We are sure that the push for open data will substantially improve translational medicine. Funding: T.E.S. was funded by a Stanford Child Health Research Institute Young Investigator award (through the Institute for Immunity, Transplantation and Infection) and the Society of University Surgeons. P.K. was funded by the Bill & Melinda Gates Foundation and National Institute of Allergy and Infectious Diseases grants 1U19AI109662, U19AI057229, U54I117925, and U01AI089859. Author contributions: T.E.S. and P.K. were in charge of the study conception and design; H.R.W. contributed to the materials; T.E.S. performed the experiments; T.E.S. and P.K. drafted the manuscript; and T.E.S., H.R.W., and P.K. critically revised the manuscript. Competing interests: The seven-gene set and the IADM have been disclosed for possible patent protection to the Stanford Office of Technology and Licensing by T.E.S. and P.K. T.E.S. also serves as a scientific advisor to Multerra Bio, which had no role in this manuscript. H.R.W. declares that he has no competing interests. Data and materials availability: The post–COCONUT-normalized data for this study have been deposited at Original MIAME (minimum information about a microarray experiment)–compliant microarray data are all available under their respective accession numbers in National Center for Biotechnology Information GEO or EBI ArrayExpress. COCONUT is available as an R package on CRAN (Comprehensive R Archive Network).
View Abstract

Navigate This Article