Research ArticleTuberculosis

# A rapid triage test for active pulmonary tuberculosis in adult patients with persistent cough

See allHide authors and affiliations

Science Translational Medicine  23 Oct 2019:
Vol. 11, Issue 515, eaaw8287
DOI: 10.1126/scitranslmed.aaw8287

## A triage test for tuberculosis

Tuberculosis remains a global health burden. Ahmad et al. used machine learning to develop an algorithm that distinguished active tuberculosis from other diseases with similar symptoms by measuring expression of four proteins in blood samples. The authors validated their triage test’s discriminatory power using blood samples from subjects with persistent cough across several continents, showing that performance was improved when detection of antibodies against a mycobacterial antigen was added to the panel. These promising results support further development and field testing using a point-of-care format.

## Abstract

Improved tuberculosis (TB) prevention and control depend critically on the development of a simple, readily accessible rapid triage test to stratify TB risk. We hypothesized that a blood protein-based host response signature for active TB (ATB) could distinguish it from other TB-like disease (OTD) in adult patients with persistent cough, thereby providing a foundation for a point-of-care (POC) triage test for ATB. Three adult cohorts consisting of ATB suspects were recruited. A bead-based immunoassay and machine learning algorithms identified a panel of four host blood proteins, interleukin-6 (IL-6), IL-8, IL-18, and vascular endothelial growth factor (VEGF), that distinguished ATB from OTD. An ultrasensitive POC-amenable single-molecule array (Simoa) panel was configured, and the ATB diagnostic algorithm underwent blind validation in an independent, multinational cohort in which ATB was distinguished from OTD with receiver operator characteristic–area under the curve (ROC-AUC) of 0.80 [95% confidence interval (CI), 0.75 to 0.85], 80% sensitivity (95% CI, 73 to 85%), and 65% specificity (95% CI, 57 to 71%). When host antibodies against TB antigen Ag85B were added to the panel, performance improved to 86% sensitivity and 69% specificity. A blood-based host response panel consisting of four proteins and antibodies to one TB antigen can help to differentiate ATB from other causes of persistent cough in patients with and without HIV infection from Africa, Asia, and South America. Performance characteristics approach World Health Organization (WHO) target product profile accuracy requirements and may provide the foundation for an urgently needed blood-based POC TB triage test.

## INTRODUCTION

There is an urgent need for improved diagnostics to detect, treat, and thereby reduce the tremendous global health burden of active tuberculosis (ATB). The Global Burden of Disease collaborators estimate that in 2015, total new and relapsed TB incident cases reached 10.2 million [95% confidence interval (CI), 9.2 to 11.5 million], with 10.1 million prevalent cases (95% CI, 9.2 to 11.1 million) and 1.3 million deaths (95% CI, 1.1 to 1.6 million) (1). Most recently, the World Health Organization (WHO) reported similar TB mortality numbers for 2017, where they estimated that 1.3 million deaths (range 1.2 to 1.4 million) were caused by TB among HIV-negative people with an additional 300,000 deaths from TB (range 266,000 to 335,000) among HIV-positive people (2). Recent population-based prevalence surveys in Africa and Asia have shown that as many as one in three patients with ATB is undiagnosed and thus untreated, with important individual and public health implications. Rapid and accurate diagnosis of ATB with current sputum-based diagnostic tools remains challenging in high-burden, resource-limited settings. The historical diagnostic standard, the century-old technique of microscopic evaluation of sputum produced by an ill patient and stained with Ziehl-Neelsen acid-fast stain, is insensitive and operator dependent (3). Solid or liquid culture systems are slow and, like state-of-the-art automated nucleic acid amplification (GeneXpert, Cepheid Inc.), have laboratory infrastructure requirements that constrain use in resource-limited settings where TB burden is highest. Furthermore, two of the most vulnerable and highly affected groups—young children and adults with HIV infection—are unlikely to be diagnosed using sputum because of difficulty obtaining sputum and low bacillary loads in the sample.

Missed and delayed diagnosis contributes to patient suffering and death and sustains transmission (4), thwarting TB control and prevention efforts. Although no diagnostic technology can address the problem of ill individuals failing to seek medical care, community-based triage tests attempt to lower the barrier to initial evaluation, guide providers in clinical decision-making, and increase motivation for those likeliest to have TB to obtain definitive testing and treatment. Several commercial blood tests have sought to fill this gap, but the WHO has strongly recommended against their use based on evidence that these tests are too inaccurate (5). The ultimate objective—a highly sensitive, specific ATB diagnostic that could be performed inexpensively with minimal technical skill at the point of care in all TB-endemic settings—remains elusive.

The challenges with rapid, cost-effective sputum-based diagnosis and the startling prevalence of undiagnosed cases have renewed interest in TB blood-based triage tests. Countries with high TB burden typically triage patients as suspect for ATB and in need of further testing based on a history of two or more weeks of cough, but the prevalence of ATB among such individuals remains relatively low. A triage test with high negative predictive value (NPV) and moderately high positive predictive value (PPV) could stratify the ATB suspect population into those for whom monitoring or conventional therapy is appropriate versus those needing immediate access to expensive advanced diagnostics (GeneXpert or liquid culture). Modeling studies using reasonable simplifying assumptions have supported the considerable clinical, fiscal, and market impacts of such a test (68). Recognizing these potential benefits, a WHO consensus panel in 2014 defined the performance requirements and sample-handling specifications for a “community-based triage or referral test” to be used with individuals in whom ATB was suspected to discriminate those unlikely to have TB from those in need of referral for further confirmatory testing (9). Such a TB rule-out test would optimally be deployed at a peripheral level of care, such as a community health facility, use a readily accessible sample such as blood from a fingerprick, and be easier to use and less expensive than the confirmatory test. Minimal performance specifications for use in a high prevalence area were defined as a sensitivity of 90% at specificity of 70%, with optimal specifications defined as sensitivity of 95% at specificity of 80% (9).

We hypothesized that a blood protein-based host response signature for ATB could distinguish it from other TB-like disease (OTD) in adult patients with persistent cough and provide the foundation for a community-based triage test for ATB. Our goal was to identify and independently validate a host response signature of informative biomarkers associated with ATB. A prototype immunoassay meeting the requisite criteria could then be migrated to a point-of-care (POC) format for deployment in resource-limited settings, where the performance as a triage test can be field-tested. The results described here are the first phase of a comprehensive discovery-to-delivery approach.

## RESULTS

The goal of this work was to generate an independently validated, blood-based ATB triage diagnostic built on an immunoassay platform that could be migrated to a POC format. Thus, we restricted our focus to circulating host markers that could be measured simultaneously within a narrow dynamic range, targeting a final diagnostic algorithm with performance that aligns with the WHO specifications for a TB rule-out triage test. Results are presented according to the assay development steps (Fig. 1A).

Table 1 provides the characteristics of the discovery cohort. The median age of the participants in the discovery cohort was 34 years, consistent with recently reported host-response TB biomarker studies (10, 11). Median age and interquartile range were similar across gender and disease categorization (ATB and OTD), although there was some variation across the two Tanzanian and one Philippine collection sites. The clinical and microbiologic diagnostic algorithm (Fig. 2) classified 199 (51%) of the discovery cohort subjects as ATB and 188 (49%) as OTD. The specific etiologies of non-TB diagnoses were not established; however, given the nature of patient accrual, alternative diagnoses should represent an epidemiologically appropriate cross section of nontuberculous etiologies of sustained cough within the respective populations.

Table 1 Clinical and demographic characteristics of adults in the discovery cohort.

View this table:

### Discovery of an ATB diagnostic algorithm based on host protein immunoassay data

Patient plasma samples (387) comprising the discovery cohort [199 ATB+/HIV and 188 OTD (ATB/HIV)] were analyzed for 47 host proteins by Luminex bead-based immunoassay (Myriad RBM inflammation panel; data file S1). The inflammation panel was selected because it was highly enriched for proteins central to the biology of host response to infection and likely to show differential signal between TB and other infectious disease. Luminex and Simoa protein concentration measurements for all samples are provided in data files S2 and S3. The overall study design is shown in Fig. 3.

After filtering analytes to remove those with >50% missing values and stratifying marker candidates by their concentrations in blood plasma, a TreeNet machine learning algorithm was used with 10-fold cross-validation to develop a model based on 12 proteins in the picogram per milliliter concentration range (Table 2A). The variable selection feature of the TreeNet algorithm was then used to subselect a minimal set of four proteins [interleukin-6 (IL-6), IL-8, IL-18, and vascular endothelial growth factor (VEGF)], resulting in an ATB diagnostic algorithm with an accuracy [receiver operator characteristic–area under the curve (ROC-AUC)] of 0.87 (95% CI, 0.83 to 0.90), sensitivity of 86% (95% CI, 81 to 91%), and specificity of 65% (95% CI, 57 to 71%) (Table 2B). This performance range aligns with the 90% sensitivity and 70% specificity suggested by the WHO target product profile (TPP) as minimal requirements for a rule-out TB test in a triage setting. The PPV and NPV of the model, assuming a TB prevalence rate of 5%, were 11.5% (95% CI, 9.0 to 14.2%) and 98.9% (95% CI, 98.3 to 99.3%), respectively. We did not find any appreciable difference in model performance between countries and between the two sites within Tanzania. It should be noted that an ATB diagnostic algorithm with three proteins (IL-6, IL-8, and IL-18) came in as a close second to the four-protein algorithm.

Table 2 Assay performance characteristics of the discovery and validation cohorts.

View this table:

Eight of the subjects were diagnosed as having ATB and a positive HIV test [antiretroviral therapy (ART) naïve]. Although model building was based exclusively on subjects who were HIV negative, the final algorithm categorized these eight samples with 100% accuracy as having an ATB signature. Strong conclusions were precluded by the small sample size; however, this result was consistent with the generalizability of the model to patients who are HIV positive.

Nineteen recruited patients (5%) provided plasma samples after 6 months of TB treatment (directly observed therapy, short course) and were denoted “cured controls.” We hypothesized that the ATB signature would no longer be present in patients successfully treated for TB. These 19 samples were excluded from model building, leaving 387 samples for that phase of discovery, but when analyzed by the diagnostic algorithm, 16 of 19 (84%) were scored as not having ATB (84% accuracy, assuming treatment was fully effective; the drug susceptibility profiles of these samples were not known). Whether those subjects whose signatures did not normalize went on to relapse could unfortunately not be determined.

### Improvement of the ATB triage algorithm using ultrasensitive measurement

Despite encouraging performance of the algorithm in the discovery cohort, the limited analytical sensitivity of the Luminex assay platform meant that many plasma samples had missing values (below the limit of detection of the Luminex assay) for a range of proteins, including the signature markers IL-6, IL-8, IL-18, and VEGF. As a result, the full dynamic ranges of values for the best-performing analytes were not being used to generate the triage algorithm. To address this limitation, we developed an ultrasensitive immunoassay for these four biomarkers on the Simoa platform. Details on Simoa assay optimization can be found in the Supplementary Materials.

After optimization, the multiplexed ultrasensitive immunoassay was first applied to the 199 ATB (191 HIV and 8 HIV+) and 188 OTD plasma samples from the discovery cohort, as well as the 19 cured controls. The mean and SD of the protein concentration measurements for each of the three groups are shown in Fig. 4A. The corresponding box plots show distinct separation between the three groups. A Mann-Whitney U test showed that protein concentrations of ATB were significantly greater than both OTD and cured controls. Mean concentration of IL-6 in blood was about four times higher in ATB than in OTD [23.5 ± 1.9 pg/ml (mean ± SE) for ATB compared to 5.7 ± 0.8 pg/ml for OTD]. As expected, mean concentration of IL-6 for the 19 cured controls was much lower than for the ATB cases. Concentration of IL-18 in ATB cases was 1450 ± 238 pg/ml compared to 618 ± 42 pg/ml for OTD, a 2.4-fold difference. Similar patterns were observed for both IL-8 and VEGF. As anticipated, the Simoa assay had several orders of magnitude greater analytical sensitivity for each of the constituent analytes than the Luminex assay, largely eliminating the original missing value problem and potentially enabling testing from blood volumes obtainable by fingerprick.

Because the revised assay configuration would alter the threshold values, we then developed a TreeNet machine learning algorithm with 10-fold cross-validation with the Simoa-derived dataset. This yielded an improvement in the performance of the ATB diagnostic algorithm, with cross-validation accuracy of 0.85 (ROC-AUC) (95% CI, 0.81 to 0.89), 87% sensitivity (95% CI, 81 to 91%), and 66% specificity (95% CI, 59 to 72%); see Discovery in Table 2C and ROC curve in Fig. 5. The PPV and NPV of the model were 11.9% (95% CI, 9.4 to 14.6%) and 98.9% (95% CI, 98.3 to 99.3%), respectively. As with the initial model, all of the eight ATB+/HIV+ plasma samples (excluded from model building) were classified as ATB by the Simoa-based TreeNet model, whereas the same 84% of cured controls were classified as non-ATB.

### Diagnostic algorithm with two or three proteins using Simoa data

In an attempt to further reduce the number of measured markers while retaining assay performance, we tested models comprising subsets of the four marker proteins. A TreeNet model that excluded VEGF, the marker ranked as least important, performed similarly to the four-protein model, albeit with a modest performance decrement (Table 2D). Because performance deteriorated further with a two-protein model using only IL-6 and IL-18, we decided to use all four proteins in our panel.

### Validation of the ATB triage algorithm

To validate the model and estimate performance, we evaluated the ultrasensitive four-biomarker assay with a blinded, fully independent cohort of samples from the Foundation for Innovative New Diagnostics (FIND) TB specimen repository. The validation cohort represented three countries [Vietnam (67%), South Africa (24%), and Peru (9%)] spanning three continents, without geographic overlap with the discovery cohort. As a further challenge, although the model was derived for plasma, most of the 318 available validation samples were serum. One sample failed to produce measurements in the Simoa assay; clinical, demographic, and other phenotypic characteristics of the remaining 317 samples are provided in Table 3. One hundred sixty (51%) of the validation cohort subjects were classified as ATB, 138 (43%) were classified as OTD, and 19 (6%) were healthy individuals with latent TB. About half (77) of the ATB case subjects were coinfected with HIV, although their disease stage and treatment status were unspecified. The validation set, therefore, stringently tested the generalizability of the model. The mean and SD of the protein concentration measurements and corresponding box plots for each of the three groups are shown in Fig. 4B. The box plots show clear separation between the three groups for three of four proteins (IL-6, IL-18, and VEGF). The IL-8 protein concentration distribution is anomalous, probably because of the preponderance of serum over plasma in the validation cohort (see Discussion). As in the discovery cohort, IL-6, IL-18, and VEGF concentrations are significantly different between ATB and both OTD and cured controls. IL-6 mean concentration for ATB is 34.8 ± 2.8 pg/ml (mean ± SE) compared to 13.7 ± 1.2 pg/ml for OTD, resulting in a 2.5-fold difference between the two groups. Mean IL-6 concentration for healthy participants with latent TB is also much lower than for the ATB cases. IL-18 concentration for ATB is 1065 ± 84 pg/ml compared to 606 ± 52 pg/ml for OTD (1.8-fold difference) and to still lower values for healthy individuals with latent TB.

Table 3 Clinical and demographic characteristic of adults in the blinded validation cohort.

View this table:

The four-protein TreeNet model using the ultrasensitive immunoassay applied to the blinded validation set provided an accuracy (ROC-AUC) of 0.80 (95% CI, 0.75 to 0.85), with 80% sensitivity (95% CI, 73 to 85%) and 65% specificity (95% CI, 57 to 71%) (Table 2C). The ROC curve is depicted in Fig. 5C. PPV and NPV were 10.7% (95% CI, 8.2 to 13.4%) and 98.4% (95% CI, 97.6 to 98.9%), respectively. The algorithm based on only three proteins (IL-6, IL-8, and IL-18) yielded an accuracy (ROC-AUC) of 0.80 (95% CI, 0.76 to 0.85), with 79% sensitivity (95% CI, 72 to 84%) and 68% specificity (95% CI, 60 to 74%) (Table 2D). PPV and NPV were 11.5% (95% CI, 8.7 to 14.5%) and 98.4% (95% CI, 97.6 to 98.9%), respectively. Nearly identical accuracy of ROC-AUC = 0.80 for the three- and four-protein models suggests the feasibility of focusing future development on the three-protein assay for simplicity and/or cost purposes. In both models, the triage algorithm performance appears independent of HIV status, with classification accuracy comparable in uninfected subjects and those coinfected with HIV (Fig. 4, A and B). The Simoa protein concentration measurements for the validation cohort are provided in data file S4.

### Enhancement of the Simoa-based ATB diagnostic algorithm with addition of Ag85B antibody measurement

Measurement of TB antigens in blood is a promising diagnostic strategy (12) that should be complementary to host-response measurement. On the basis of analysis of preliminary data produced by FIND, we configured a Simoa assay to measure antibody against the TB antigen Ag85B. Details of assay development can be found in the Supplementary Materials. Results from a subset of 583 plasma and serum samples analyzed using this assay are provided in Table 4. The Simoa data measuring protein concentrations of the four cytokines and antibody response against Ag85B are provided in data file S5.

Table 4 Simoa assay performance characteristics with Ag85B antibody measurement.

View this table:

The performance of the final triage algorithm was enhanced by adding Ag85B antibody measurement to the four-biomarker panel, improving sensitivity to 86% at 69% specificity. Additional work, including independent blinded validation, is needed to demonstrate that this result is generalizable.

## DISCUSSION

We have identified a host blood protein diagnostic signature consisting of IL-6, IL-8, IL-18, and VEGF that discriminates between adults with ATB, with or without HIV coinfection, and those who do not have TB but present with TB-like symptoms (OTD). Although the test is intended to be applied to individuals with TB-like symptoms, we have also demonstrated that it is not confounded by healthy individuals with latent TB infection. The mean values of each of the four individual biomarkers were significantly different between the ATB and control participants, although there was some overlap in the distributions of values for the different cohorts. This overlap is consistent with a recent paper showing that baseline measurements for some cytokines varied widely between individuals, such that baseline values for a particular cytokine could be elevated compared to mean values even in healthy subjects (13). Because our study was designed with an end goal of developing a simple immunoassay for use as a triage test at the primary health level in TB high-burden countries, biomarkers were chosen specifically to fall within a relatively narrow dynamic range to support adaptation to a rapid immunoassay format. Our initial model based on measurements from the Luminex bead-based immunoassay platform performed well; however, the limited analytical sensitivity of the bead-based assays led to a number of samples with values below the analytic limit of detection, a problem that may worsen with the use of smaller sample volumes in a fingerprick format. Migration to the ultrasensitive POC-amenable Simoa immunoassay technology eliminated these missing values and provided an assessment of the incremental improvement in performance that derives from access to the full dynamic range down to 0.009 pg/ml for IL-6, 0.009 pg/ml for IL-8, 0.05 pg/ml for IL-18, and 0.12 pg/ml for VEGF.

Circulating concentrations of each of IL-6, IL-8, IL-18, and VEGF have been found to be elevated in a variety of infectious and inflammatory pneumonias, and both prognostic and mechanistic implications of each have been proposed (1423). Although some of these four proteins have also previously been individually described as associated with TB (2429), it is only when combined in a predictive algorithm that they gain sufficient power to distinguish adults with ATB from those with TB-like symptoms because of other etiologies. The intricacies of the immune response to respiratory infection remain a subject of intensive study; this specific marker panel would not have been confidently predicted from first principles but could only have emerged through a large-scale comparison of host-response profiles of patients with TB-like symptoms of both tuberculous and nontuberculous etiologies. Even with a statistically principled discovery and verification paradigm, such an approach requires stringent independent validation. In a singularly challenging blinded validation set of 317 subjects from TB-endemic areas around the world, the four-marker algorithm distinguished ATB from OTD with a sensitivity of 80% (95% CI, 73 to 87%) and specificity of 65% (95% CI, 57 to 71%). The broad generalizability of the signature is demonstrated by its effectiveness in diverse populations exposed to different TB strains across widely disparate geographic locations including Peru, the Philippines, South Africa, Tanzania, and Vietnam. Because the signature is based on a blood sample, it is reasonable to expect that it would be effective in nonsputum-producing individuals, although this has not yet been formally tested.

We have also shown that the diagnostic assay performance of the four-protein panel can be enhanced by measuring antibodies against the TB antigen Ag85B. Although Ag85B is highly conserved across mycobacterial species and shares a high degree of amino acid sequence homology (30), as an immunodominant antigen, it has been identified as one of the top choices in several serological biosignature panels for TB diagnosis (31, 32). Measuring antibodies to Ag85B in addition to host inflammatory protein markers provided direct information of human immune response to Mycobacterium tuberculosis (MTB) infection.

There are several limitations to this study. Although great care was exercised in the case definitions, our study shares with others in infectious disease and global health the challenge of achieving a “gold standard” sample population, particularly with respect to assuring “true-negative” controls. Follow-up data collection across the discovery cohort was not logistically tractable but would have been helpful in at least two ways: First, in demonstrating appropriate clinical resolution of patients with OTD, and second, in determining whether any patients predicted to have (but not diagnosed with) TB went on to develop active disease. Establishing a definitive alternative diagnosis for all ill controls, an approach that we entertained but for reasons of complexity and cost was not feasible in this program, would have been more informative. However, given the nature of our patient accrual, it is reasonable to argue that the alternative diagnoses should represent an epidemiologically appropriate cross section of nontuberculous etiologies of sustained cough within the multiple respective populations, a contention supported by validation of the model in a sample population that was ethnically and geographically distinct from the discovery cohort. Further expansion of the discovery cohort would have strengthened statistical power but was cost and time prohibitive. A broader sampling of circulating proteins, whether array or mass spectrometry based, would have afforded a larger population of candidates from which to derive predictive models; however, the Simoa assays used here are considerably more sensitive than these other methodologies. Our approach was both strategic and opportunistic: We were looking for host response, and the selected panel included precisely the types of markers we anticipated might show a differential signal between TB and other infectious diseases. The exact panel composition, however, was based on the vendor’s design rather than our own. A blinded test set of the same biological matrix as the discovery set (plasma) would have more accurately reflected the performance characteristics of our predictive model; unfortunately, the precious FIND repository consists primarily of serum. Last, an ideal “rule-out” triage test would have perfect sensitivity so that a negative result would ensure absence of disease. Although our model approximates the WHO TPP for a TB triage test, its performance characteristics could be further strengthened. Strategies for achieving this are discussed below.

This study began with the hypothesis that a blood-based host response signature for ATB could distinguish it from nontuberculous disease in adult patients presenting with persistent cough, thereby providing the foundation for a TB triage test. The notion that a molecular host response could distinguish TB from other illness etiologies in patients with or without HIV has been strongly supported by several studies using RNA expression profiling of white blood cells from TB suspect patients (11, 3335). Although earlier RNA-based models gained predictive accuracy from inclusion of many tens of transcripts, recent efforts have derived tests that accurately predict progression to TB from latent disease (36) or after household exposure (37) with as few as three to four genes. The small scale and strong performance of these predictors begin to address the concerns expressed in a recent commentary by Broger et al. (32), who noted that RNA-based signatures have not yet met the minimum WHO TPP requirements for sensitivity and specificity in relevant TB patient populations, pointing out that platforms needed for running a transcript-based assay in resource-limited settings do not yet exist. Gliddon and colleagues (38) struck a hopeful note in their recent review, arguing that although currently there are many hurdles to overcome, the next decade could see advances in gene expression–based disease diagnosis.

Differential host blood signatures for ATB have also been recently observed at the protein level. Chegou and colleagues (10) found a nonoverlapping seven-marker serum protein signature for ATB disease in African primary healthcare clinic attendees with signs and symptoms of TB. That signature was derived from samples from five African countries and validated on the same population in a train/test paradigm, making its generalizability uncertain. In another recent study, De Groote and colleagues (39) used an unbiased aptamer array-based proteomics approach and a geographically diverse sample set to discover and validate a six-marker serum protein signature for the diagnosis of ATB. Although the signature met the requisite minimum performance criteria, it is not clear whether the proprietary technology used provides a path toward a cost-effective POC for ATB diagnosis. Likewise, the serum marker panels proposed by Achkar and colleagues (40) (which differed somewhat between HIV-positive and HIV-negative individuals) were found using mass spectrometry–based methods and provided no direct route to field deployment. Although marker lists tend to represent inflammatory biology, the specific marker sets are almost completely nonoverlapping. As no publicly available dataset includes all of the necessary markers, head-to-head comparison and integrative analysis of these several candidate proteomic signatures cannot yet be performed. Ongoing technology developments are simplifying RNA analysis, but protein measurements by immunoassay continue to offer advantages for rapid, inexpensive blood tests and have a better prospect for near-term deployment as POC diagnostics in resource-limited settings where TB is endemic.

Although the patients from whom the 317 validation samples were obtained were similar in profile to the discovery population, there were some notable differences. First, they were from entirely nonoverlapping geographic regions, comprising countries (and continents) not represented in the training set. Furthermore, because of sample availability, only 15% of 160 ATB and 137 OTD samples in the validation cohort were plasma; the balance being serum. The discovery cohort samples, by contrast, were exclusively plasma. As in the discovery cohort, a Mann-Whitney U test showed that protein concentrations of ATB were significantly greater than that of OTD, except that in the validation cohort, IL-8 does not vary significantly between the two groups due to being at much higher than expected concentrations in both groups. Although other factors may contribute, this is likely a consequence of the predominance of serum samples in the test set. Friebe et al. (41) examined the stability of cytokines after blood collection in a population of patients with systemic immune activation and demonstrated that, unlike in plasma, IL-8 concentrations markedly increased in serum. Furthermore, in the analysis by Wu and colleagues (13), IL-8 had the single highest intersubject variability in serum of the 15 measured cytokines. Paired serum and plasma samples from our populations were not available for direct comparison of IL-8 concentrations. However, although numbers are not large enough for statistical assessment, in the subpopulation of FIND plasma samples, it appears that the IL-8 differential is preserved in the test set. Even more than the geographical variation and nonspecific matrix differences, it is probable that the loss of this marker contributed to the modest performance decrement of the model in the blinded validation phase. That the decrement was small speaks to the robustness of the model and suggests that if the sample type remained consistent between the discovery efforts and blinded testing, as would typically be the case, then performance would improve.

The analytes that comprise the TB diagnostic assay have now been tested in over 700 plasma and serum samples, and the performance of the diagnostic algorithm based on them has been shown to align with minimal WHO specifications for a TB triage test for individuals presenting with persistent cough. Among patients with HIV living in highly TB-endemic areas, even asymptomatic individuals may warrant assessment for TB; for instance, 8.5% of 274 asymptomatic ART-naïve HIV-infected persons in Cape Town were found to have subclinical TB disease (42). Test performance in such an asymptomatic population has not been tested. Furthermore, future additional independent validation sets should incorporate larger numbers of treated and untreated subjects at all stages of HIV and confirm model performance independent of other comorbidities such as metabolic disease. Evolving platform capabilities will facilitate these additional studies. The Simoa HD-1 platform is capable of providing the first sample-to-answer result in less than an hour with subsequent samples coming off every minute after the initial sample. The current platform is designed for full automation, high throughput, and reproducible sample handling. The per-test cost is approximately $10 for a four-plex assay, which is beyond the POC target range. However, Simoa is amenable to a POC format, such that an integrated consumable could be made from less costly materials with a target of$2 per test after optimization. A POC device can also offer a shorter time-to-result solution. For example, the Simoa technology uses a digital enzyme-linked immunosorbent assay (ELISA) concept, which has recently been translated into a droplet-based POC device (43). This example demonstrates the feasibility of a low-cost mobile digital ELISA platform that maintains multiplex capability and ultrasensitivity and that offers one possible path to our objective of bringing a POC diagnostic device to TB high-burden countries, as a TB rule-out triage test.

Expansion of the diagnostic with additional informative markers should further improve performance. Of particular interest for incorporation into the existing TB diagnostic are two types of markers: antigens capable of capturing anti-MTB antibodies from the blood of patients with active infection and, conversely, antibodies to MTB antigens. The first example of this immune response-focused biomarker type, Ag85B, emerged as the top hit in a screen of 71 MTB antigens and was generously provided to us by FIND. Although it is unlikely that immunoglobulin G detection alone can reach the required TPP performance, it might still be useful in combination with other biomarkers (32), as we have found by adding the detection of antibodies against Ag85B to the four-protein panel. The approach to configuring an antibody-detection assay, described in Supplementary Materials and Methods, is different from that for a conventional Simoa immunoassay but has proven successful in sensitive detection of dengue infections (44). Because the antibodies to Ag85B were previously measured in 300 of 317 FIND plasma and serum samples used for our blinded validation of the TB diagnostic, we were able to model the impact of adding the antigen to the four markers in our panel, resulting in an anticipated ROC-AUC of 0.90. Setting specificity to 70.0% (95% CI, 62.7 to 77.3%) yielded a sensitivity of 92.0% (95% CI, 87.2 to 94.8%), identical to the WHO TPP for a TB triage test. This encouraging result motivated configuration of a Simoa assay to measure Ag85B antibodies in blood, yielding an apparently enhanced diagnostic algorithm sensitivity of 86.2% (95% CI, 82.2 to 90.4%) at a specificity of 69.2% (95% CI, 64.2 to 74.2%). Ultimately, the Ag85B results should be understood as a discovery effort that needs to be independently validated with a blinded cohort. However, it illustrates a general strategy for test refinement: Further improvements are likely to be observed with additional markers, especially with the incorporation of additional Simoa assay(s) currently under development that directly measure MTB antigens. In its current configuration, multiplexed Simoa enables the simultaneous detection of up to 10 proteins at subfemtomolar concentrations; fully implemented assays to date have been developed for simultaneous measurement of up to six analytes (45). A near-term goal is thus to incrementally improve the marker panel up to an anticipated 6- to 10-plex assay. However, given the urgency of the problem, the possibility of incremental improvements will not delay platform refinement and field testing. Our existing results approach the WHO TPP for accuracy and pave the way for the development and eventual deployment of a much-needed blood-based POC triage test for ATB.

## MATERIALS AND METHODS

### Study design and human subject oversight

The study was conducted in two phases: biomarker discovery and algorithm validation. For the biomarker discovery phase, we prospectively recruited 406 individuals ≥16 years of age having at least 2 weeks of persistent cough and symptoms suggestive of pulmonary TB, who presented to primary health clinics in Dar es Salaam and Pemba (Tanzania) and Bohol (the Philippines) between February 2008 and March 2010. Single sites were used in each country, except in Tanzania, which had two sites (the island of Pemba and the capital of Dar es Salaam). All clinical data were anonymized, and unique barcodes were used to identify patient blood samples. Patients were assigned to a specific diagnosis of either ATB (N = 199) or a diagnosis other than TB (OTD) (N = 188) based on a clinical and microbiologic assessment developed by experienced TB clinicians (Fig. 2). Subjects were recruited as part of a broader study, the Population Health Metric Consortium, funded by the Bill & Melinda Gates Foundation’s Grand Challenges in Global Health initiative. Trained local health workers provided verbal and written information to all potential study participants and obtained written informed consent from all recruits in their vernacular language. In addition, 19 subjects also provided plasma samples after 6 months of daily observed TB treatment, and these samples were considered cured controls. All samples were collected under the study titled “Population Health Metrics Research Consortium Project,” and the study protocol was approved by the Committee on the Use of Humans as Experimental Subjects (protocol no. 0502001089) at Massachusetts Institute of Technology pursuant to Federal Regulations [45 CFR, part 46.101(b)(4)].

The independent algorithm validation phase of the study was conducted using 317 blinded serum and plasma samples obtained from the FIND (Geneva, Switzerland) TB sample repository. FIND collected these samples for use in diagnostic evaluations under Institutional Review Board (IRB)–approved studies of patients with suspect TB from similarly defined ATB (N = 160) and OTD (N = 157) patients using a slightly different, rigorous clinical and microbiologic assessment, as well as 19 patients with confirmed latent TB infection, all of whom were recruited from Peru, South Africa, and Vietnam. All clinical data associated with the FIND validation plasma and serum samples were anonymized, and barcodes were used to access pertinent information.

In both discovery and validation cohorts, patients who had undergone treatment for TB within 12 months or any antibiotic therapy within 1 month of presentation were excluded; HIV-infected subjects were excluded from the discovery cohort. The overall study design for sample collection is shown in Fig. 3.

### Luminex assay

Multiplexed, quantitative immunoassay measurements of 47 specific host proteins were performed on the 387 discovery cohort plasma samples with a commercial Human InflammationMAP Luminex assay by Myriad RBM Inc., a clinical laboratory improvement amendments (CLIA) certified laboratory. The full list of 47 analytes can be found in data file S1. All plasma samples were submitted, barcoded, and anonymized without clinical annotation to ensure that the Luminex measurements were performed in a blinded manner.

### Simoa assay

The panel of four host biomarkers (IL-6, IL-8, IL-18, and VEGF) that best distinguished ATB from OTD in the discovery cohort was configured on the Simoa ultrasensitive immunoassay platform using a commercial HD-1 Analyzer (Quanterix) to improve its discriminatory power. After optimization on the discovery cohort samples, the performance of the diagnostic algorithm based on ultrasensitive measurement of the four analytes was assessed in the validation cohort. Validation samples were also blinded until after the triage algorithm classification was complete.

On the basis of preliminary data from FIND, we also configured the Simoa system to detect the presence or absence of antibody against the TB antigen Ag85B (accession no. Rv1886c). The additional detection of the Ag85B antibody was applied to a subset (N = 585) of the full set of 723 plasma and serum samples from both cohorts.

### Diagnostic algorithm development and statistical analysis

In the discovery phase, a machine learning algorithm called TreeNet (an implementation of Stochastic Gradient Boosting Machine in Salford Predictive Modeler, Salford Systems) was used to develop the ATB diagnostic model. Luminex-based blood concentration measurements of the 47 starting analytes with >50% missing values were filtered, yielding 31 analytes. To facilitate migration of the biomarkers to a POC immunoassay platform, proteins were sorted by their concentration range in plasma. A subset of 12 markers within a narrow picogram per milliliter dynamic range proved to be best at discriminating ATB from OTD and was selected as inputs for final model construction using TreeNet and 10-fold cross-validation. The final algorithm based on four host biomarkers was then used to classify the blinded plasma and serum samples in the independent validation cohort obtained from FIND.

To explore model enhancements, we used TreeNet with 10-fold cross-validation (N = 583) to determine the performance increment with the addition of Ag85B antibody-based measurement to the four host marker panel. The diagnostic accuracy of the model at each step was evaluated by ROC curve analysis, including AUC measurements (ROC-AUC). Additional details of the TreeNet machine learning algorithm and 10-fold cross-validation can be found in the Supplementary Materials.

### Code availability

We used the commercially available SPM 8.0 Ultra software package for machine learning and for building the ATB diagnostic model. A fully functional version of the software is available for evaluation at no charge for a 30-day period at www.salford-systems.com. The freely available R package Generalized Boosted Models can give similar but not identical results. The proprietary TreeNet algorithm available in SPM 8.0 Ultra and used in this study is the original gradient boosting machine authored by the inventor of the methodology (J. Friedman).

## SUPPLEMENTARY MATERIALS

stm.sciencemag.org/cgi/content/full/11/515/eaaw8287/DC1

Materials and Methods

Fig. S1. Simoa calibration curve for four cytokines using the four-plex assay with standards prepared in calibrator diluent, plotted on a log-log scale.

Fig. S2. Plots of AEB against concentrations of cytokines for four different unique bead types in a four-plex Simoa assay.

Fig. S3. Precision profile of the four-plex cytokine assay.

Table S1. Coupling confirmation for Ag85B antigen.

Table S2. Inter-run precision of AEB values for four cytokines over three individual runs on different days with triplicate measurements per run.

Table S3. Interassay variation of four-plex cytokine assay.

Table S4. Interassay variation of Ag85B serological assay.

Table S5. Results for spike and recovery tests in real human serum samples using a fourfold dilution with sample diluent.

Data file S1. Inflammation map.

Data file S2. Luminex data.

Data file S3. Four cytokine Simoa data of Broad plasma samples.

Data file S4. Four cytokine Simoa data of FIND plasma and serum samples.

Data file S5. Four cytokine and Ag85B IgG Simoa data of tested samples (combined Broad and FIND).

References (4656)

## REFERENCES AND NOTES

Acknowledgments: We especially thank E. Rubin at the Harvard School of Public Health for expert advice and guidance. We thank C. Denkinger at FIND (Geneva, Switzerland) for generous donation of the validation samples. We also thank A. Lopez at the University of Melbourne for support. Funding: This work was supported in part by a Grand Challenge in Global Health grant from the Bill and Melinda Gates Foundation for the study titled “Population Health Metrics Research Consortium Project” (contract no. 041222241) (to S.A.C). Author contributions: The study was designed by R.A., S.A.C., D.R.W., and M.A.G. The manuscript was written by R.A., D.R.W., and M.A.G. R.A. designed and performed most of the data analysis. R.A. and D.S. developed the machine learning algorithms. D.R.W. and L.X. designed all Simoa experiments. L.X. performed all Simoa experiments. M.P., M.J.S., M.M., and F.S.B. processed the plasma and serum samples for the Luminex and Simoa experiments. M.F.S. and B.R. provided support for the initial Luminex-based discovery experiments. R. Bencher and D.P.E. provided support for all Luminex-based experiments. T.B. provided technical help and validation samples from FIND. M.G.L., D.M.S., V.L.T., and I.D.R. were responsible for sample collection at the Bohol, Philippines site. S.M.Ame, A.D., U.D., S.D., S.M.Ali, S.S., and R. Black were responsible for sample collection at the Pemba, Tanzania site. S.M., W.W.F., and Z.P. were responsible for sample collection at the Dar es Salaam, Tanzania site. C.J.L.M. was the primary principal investigator of the Gates Foundation sponsored study and provided overall support for the project. All authors contributed to editing the manuscript. Competing interests: D.R.W. has a financial interest in Quanterix Corporation, a company that develops an ultrasensitive digital immunoassay platform. He is an inventor of the Simoa technology and a founder of the company and also serves on its Board of Directors. D.R.W.’s interests were reviewed and are managed by BWH and Partners HealthCare in accordance with their conflict of interest policies. T.B. is employed by the FIND. FIND is a not-for-profit foundation that supports the evaluation of publicly prioritized TB assays and the implementation of WHO-approved assays using donor grants. FIND has product evaluation agreements with several private sector companies that design diagnostics for TB and other diseases. These agreements strictly define FIND’s independence and neutrality with regard to these private sector companies. S.M. is an unpaid board member for and has an equity stake in a diagnostic start up focused on measurement of nutritional biomarkers at the POC using the results from his research. R.A., S.A.C., and M.A.G. are inventors on U.S. patent no. 9,702,886 held by the Broad Institute of MIT and Harvard and Massachusetts General Hospital that covers the use of the biomarkers and algorithms described in this manuscript as a device for TB diagnosis. Data and materials availability: All data needed to interpret and reproduce the results can be found in the manuscript and the Supplementary Materials. Recombinant Ag85B (Rv1886c) antigen was received from Natural and Medical Sciences Institute at the University of Tuebingen (NMI) via FIND network under material transfer agreement. We used the commercially available SPM 8.0 Ultra software package for machine learning and for building the ATB diagnostic model. A fully functional version of the software is available for evaluation at no charge for a 30-day period at www.salford-systems.com. The freely available R package Generalized Boosted Models can give similar but not identical results. The proprietary TreeNet algorithm available in SPM 8.0 Ultra and used in this study is the original gradient boosting machine authored by the inventor of the methodology, J. Friedman.