Research ArticleInfectious diseases

Metabolic differentiation of early Lyme disease from southern tick–associated rash illness (STARI)

See allHide authors and affiliations

Science Translational Medicine  16 Aug 2017:
Vol. 9, Issue 403, eaal2717
DOI: 10.1126/scitranslmed.aal2717

Avoiding rash decisions

The iconic bulls eye rash is associated with Lyme disease, but similar symptoms are observed in other illnesses not caused by Borrelia burgdorferi infection, such as southern tick–associated rash illness, which has not yet been tied to a specific pathogen. Molins et al. set out to define a metabolic signature that would be able to distinguish early Lyme disease from southern tick–associated rash illness. Their findings confirm that these diseases are quite distinct. Moreover, use of the signature could help guide diagnosis and patient treatment.


Lyme disease, the most commonly reported vector-borne disease in the United States, results from infection with Borrelia burgdorferi. Early clinical diagnosis of this disease is largely based on the presence of an erythematous skin lesion for individuals in high-risk regions. This, however, can be confused with other illnesses including southern tick−associated rash illness (STARI), an illness that lacks a defined etiological agent or laboratory diagnostic test, and is coprevalent with Lyme disease in portions of the eastern United States. By applying an unbiased metabolomics approach with sera retrospectively obtained from well-characterized patients, we defined biochemical and diagnostic differences between early Lyme disease and STARI. Specifically, a metabolic biosignature consisting of 261 molecular features (MFs) revealed that altered N-acyl ethanolamine and primary fatty acid amide metabolism discriminated early Lyme disease from STARI. Development of classification models with the 261-MF biosignature and testing against validation samples differentiated early Lyme disease from STARI with an accuracy of 85 to 98%. These findings revealed metabolic dissimilarity between early Lyme disease and STARI, and provide a powerful and new approach to inform patient management by objectively distinguishing early Lyme disease from an illness with nearly identical symptoms.


Lyme disease is a multisystem bacterial infection that, in the United States, is primarily caused by infection with Borrelia burgdorferi sensu stricto. More than 300,000 cases of Lyme disease are estimated to occur annually in the United States, with more than 3.4 million laboratory diagnostic tests performed each year (1, 2). Symptoms associated with this infection include fever, chills, headache, fatigue, muscle and joint aches, and swollen lymph nodes; however, the most prominent clinical manifestation in the early stage is the presence of one or more erythema migrans (EM) skin lesions (3). This annular, expanding erythematous skin lesion occurs at the site of the tick bite in 70 to 80% of infected individuals and is typically 5 cm or more in diameter (4, 5). Although an EM lesion is a hallmark for Lyme disease, other types of skin lesions can be confused with EM (3, 5, 6). These include rashes caused by tick bite hypersensitivity reactions, certain cutaneous fungal infections, bacterial cellulitis, and the rash of southern tick−associated rash illness (STARI) (7, 8).

STARI is associated with a bite from the lone star tick (Amblyomma americanum), and in addition to the development of an EM-like skin lesion, individuals with STARI can present with mild systemic symptoms (including muscle and joint pains, fatigue, fever, chills, and headache) that are similar to those occurring in patients with Lyme disease (7, 9, 10). These characteristics of STARI have led some to postulate that the etiology of this illness is a Borrelia species, including B. burgdorferi (10, 11) or B. lonestari (1215); however, multiple studies have refuted that STARI is caused by B. burgdorferi (7, 1619), and additional cases associating B. lonestari with STARI have not emerged (20, 21). Additionally, STARI patients have been screened serologically for reactivity to rickettsial agents, but no evidence was obtained to demonstrate that rickettsia causes this illness (10, 22). At present, no infectious etiology is known for STARI, and no laboratory test is available to support a clinical diagnosis based on symptoms, geographic location, and history of a tick bite.

STARI cases occur over the geographic region where the lone star tick is present. This includes a region that currently expands from central Texas and Oklahoma upward into the Midwestern states and eastward, including the southern states and along the Atlantic coast into Maine (23). Unlike STARI, Lyme disease is transmitted to humans through the bite of the blacklegged tick (Ixodes scapularis), which is present in the northeastern, mid-Atlantic, and north-central United States, and the western blacklegged tick (Ixodes pacificus), which is present on the Pacific Coast (24). The geographic distribution of human Lyme disease and the vectors for this disease is expanding (2426), and there is a similar expansion of areas inhabited by the lone star tick (23). A strict geographic segregation of Lyme disease and STARI does not exist, because there are regions where STARI and Lyme disease are coprevalent (25). Thus, there is a growing need for diagnostic methods to differentiate between Lyme disease and STARI and that facilitate proper treatment, patient management, and disease surveillance.

Clinically, the skin lesions of STARI and early Lyme disease are indistinguishable, and the only biomarkers evaluated for differential diagnosis of early Lyme disease and STARI have been serum antibodies to B. burgdorferi (10, 16). However, these tests have poor sensitivity for early stages of Lyme disease, and thus, a lack of B. burgdorferi antibodies cannot be used as a reliable differential marker for STARI. Metabolic profiling allows definition of the current physiological state of an individual, including the underlying biochemistry of disease pathology. Thus, the ability to evaluate large numbers of small-molecule metabolites in a single analysis has become an attractive approach for defining biomarkers of health and diagnosis of disease (27, 28). We previously demonstrated that metabolic profiling of sera provided a high level of accuracy in differentiating early Lyme disease patients from healthy individuals and those with diseases or conditions that are confounders for existing serological-based laboratory tests of Lyme disease (29). Thus, we now postulate that a metabolomics-driven approach could identify biomarkers that discriminate early Lyme disease from STARI and provide evidence that these two diseases are biochemically distinct. A retrospective cohort of well-characterized sera from patients with early Lyme disease and STARI was evaluated to identify a differentiating metabolic biosignature. Using statistical modeling, this metabolic biosignature accurately classified test samples that included healthy controls, and revealed differences in metabolic pathways.


Clinical samples

A total of 220 retrospective serum samples from three different repositories and collected on the basis of criteria described in table S1 were used to develop and test a metabolic biosignature that accurately classifies early Lyme disease and STARI. All samples from Lyme disease patients were culture-confirmed and/or polymerase chain reaction (PCR)–positive for B. burgdorferi. The median age for early Lyme disease patients (n = 70) was 45 years, and 74% were males. STARI patients (n = 55) had an overall median age of 45 years, and 55% were males. All samples used in biosignature discovery and classification model development including healthy controls from Colorado and New York (n = 58) were divided into Discovery/Training-Sets and Test-Sets, as described in the study design.

To establish a Lyme disease diagnostic baseline, we performed the recommended two-tiered serology testing for Lyme disease on all samples. First-tier testing was performed using the C6 enzyme immunoassay (EIA) and was positive for 66% of Lyme disease samples; two STARI samples (2%) and five healthy controls (9%) also tested positive or equivocal. Two-tiered testing using immunoglobulin M (IgM) and IgG immunoblots as the second-tier test after a positive or equivocal first-tier assay resulted in a sensitivity of 44% for early Lyme disease samples (duration of illness was not considered for IgM immunoblot testing). The sensitivity of two-tiered testing for early Lyme disease samples included in the Discovery/Training-Sets and the Test-Sets was 40 and 50%, respectively. All STARI and healthy control samples were negative by two-tiered testing (table S1).

Development of a metabolic biosignature for early Lyme disease and STARI differentiation

Metabolic profiling by liquid chromatography–mass spectrometry (LC-MS) of a retrospective cohort of well-characterized sera from patients with early Lyme disease (n = 40) and STARI (n = 36) (table S1 and Fig. 1A) comprising the Discovery-Set resulted in a biosignature of 792 molecular features (MFs) that differed significantly (adjusted P < 0.05), with a ≥2-fold change in relative abundance between early Lyme disease and STARI. Down-selection of MFs based on their robustness in replicate analyses of the same sera produced a refined biosignature of 261 MFs (Fig. 1A and table S2). Of these 261 MFs, 60 and 201 displayed an increased and decreased abundance, respectively, in early Lyme disease as compared to STARI. The large number of MFs that differed significantly (P < 0.05) between early Lyme disease and STARI patients indicated that these two patient groups had distinguishing biochemical profiles. These variances could be applied to define alterations of specific metabolic pathways (Fig. 1A) and used to develop diagnostic classification models (Fig. 1B).

Fig. 1. Metabolic profiling for the identification and application of differentiating MFs.

(A) LC-MS data from an initial Discovery-Set of early Lyme disease (EL) and STARI samples were used to identify a list of MFs that were targeted in a second LC-MS run. The data from both LC-MS runs were combined to form the Targeted-Discovery-Set. The MFs were then screened for consistency and robustness, and this resulted in a final early Lyme disease–STARI biosignature of 261 MFs. This biosignature was used for downstream pathway analysis and for classification modeling. MPP, Mass Profiler Pro. (B) Two training-sets along with the 261-MF biosignature list were used to train multiple classification models, random forest (RF), and least absolute shrinkage and selection operator (LASSO). Data from samples of two Test-Sets (not included for the Discovery/Training-Set data) were blindly tested against the two-way (early Lyme disease versus STARI) and three-way [early Lyme disease versus STARI versus healthy controls (HC)] classification models. The regression coefficients used for each MF in the LASSO two-way and three-way classification models are provided in tables S4 and S6, respectively.

In silico analysis of metabolic pathways

Presumptive chemical identification was applied to the 261 MFs. This yielded predicted chemical formulas for 149 MFs, and 122 MFs were assigned a putative chemical structure based on interrogation of each MF’s monoisotopic mass [±15 parts per million (ppm)] against the Metlin database and the Human Metabolome Database (HMDB) (table S2). An in silico interrogation of potentially altered metabolic pathways was performed using the presumptive identifications for the 122 MFs and MetaboAnalyst (30). Four differentiating pathways were predicted to have the greatest impact, with glycerophospholipid and sphingolipid metabolism having the most number of assigned metabolites (Fig. 2 and table S3). Specifically, the MetaboAnalyst analysis indicated that differences in phosphatidic acid, phosphatidylethanolamine, phosphatidylcholine, and lysophosphatidylcholine were the major contributors to altered glycerophospholipid metabolism between STARI and early Lyme disease (table S3). Altered sphingolipid metabolism between these two groups was attributable to changes in the relative abundances of sphingosine, dehydrosphinganine, and sulfatide (table S3). Manual interrogation of the predicted structural identifications revealed that 26 and 7 of the 122 MFs assigned a putative structural identification were associated with glycerophospholipid and sphingolipid metabolism, respectively.

Fig. 2. Pathways differentially regulated in patients with early Lyme disease and STARI.

The 122 presumptively identified MFs were analyzed using MetaboAnalyst to identify perturbed pathways between early Lyme disease and STARI. The color and size of each circle are based on P values and pathway impact values. Pathways with a >0.1 impact were considered to be perturbed and differentially regulated between patients with early Lyme disease and STARI. A total of four pathways were affected: (i) glycerophospholipid metabolism; (ii) sphingolipid metabolism; (iii) valine, leucine, and isoleucine biosynthesis; and (iv) phenylalanine metabolism.

Elucidation of altered N-acylethanolamine metabolism

The prediction of altered metabolic pathways was based on the presumptive structural identification of the early Lyme disease versus STARI differentiating MFs. Thus, to further define the metabolic differences between these two patient groups, LC-MS/MS was applied for structural confirmation of selected MFs. Two MFs that displayed relatively large abundance differences [mass/charge ratio (m/z), 300.2892 and retention time (RT), 19.66; m/z, 328.3204 and RT, 20.72] were putatively identified as sphingosine-C18 or 3-ketosphinganine, and sphingosine-C20 or N,N-dimethyl sphingosine, respectively. However, both of these MFs had alternative predicted structures of palmitoyl ethanolamide and stearoyl ethanolamide, respectively. The interrogation of authentic standards against these two serum MFs revealed RTs and MS/MS spectra that identified the m/z 300.2892 and m/z 328.3204 products as palmitoyl ethanolamide (Fig. 3, A and B) and stearoyl ethanolamide (fig. S1), respectively. These two products, as well as other N-acylethanolamines (NAEs), are derived from phosphatidylethanolamine and phosphatidylcholine, and represent a class of structures termed endocannabinoids and endocannabinoid-like (Fig. 3C) (31). Further analysis of the 122 MFs identified five additional MFs with predicted structures that mapped to the NAE pathway. Specifically, MF with an m/z of 286.2737 and an RT of 19.08 was putatively identified as a sphingosine-C17 or pentadecanoyl ethanolamide and was confirmed to be the latter (fig. S2). MF with an m/z of 356.3517 and an RT of 21.67 was putatively identified and confirmed to be eicosanoyl ethanolamide (fig. S3), and MF with an m/z of 454.2923 and an RT of 18.08 was confirmed to be glycerophospho-N-palmitoyl ethanolamine (fig. S4), an intermediate in the formation of palmitoyl ethanolamide. A second group of lipids, the primary fatty acid amides (PFAMs) that act as signaling molecules and that are potentially associated with the metabolism of NAEs, was also identified as having significant (P < 0.05) relative abundance differences between the early Lyme disease and STARI patient samples. Specifically, MFs with an m/z of 256.2632 and an RT of 20.08, an m/z of 284.2943 and an RT of 21.15, and an m/z of 338.3430 and an RT of 22.14 were confirmed to be palmitamide (Fig. 3, D and E), stearamide (fig. S5), and erucamide (fig. S6), respectively.

Fig. 3. Metabolite identification and association with NAE and PFAM metabolism.

Structural identification of palmitoyl ethanolamide (A and B) and palmitamide (D and E) in the 261-MF biosignature is part of NAE and PFAM metabolism (C). Structural identification was achieved by RT alignment (A and D) of authentic standard (top panel), authentic standard spiked in pooled patient sera (middle panel), and the targeted metabolite in pooled patient sera (bottom panel), and by comparison of MS/MS spectra (B and E) of the authentic standards (top) and the targeted metabolites in patient sera (bottom). RT alignments for palmitoyl ethanolamide (A) and palmitamide (D) were generated with extracted ion chromatograms for m/z of 300.2892 and 256.2632, respectively. The relationship of PFAM formation to NAE metabolism is highlighted in light green in (C). PLA, phospholipase A; PLC, phospholipase C; PLD, phospholipase D; ADH, alcohol dehydrogenase; PAM, peptidylglycine α-amidating monooxygenase; AEA, arachidonoyl ethanolamide.

The large number of differentiating MFs associated with NAE metabolism suggested that this is a major biological difference between STARI and early Lyme disease (Fig. 3C and table S2). Four additional MFs of the 261-MF biosignature and that fit known host biochemical pathways were also structurally confirmed. These included l-phenylalanine (fig. S7), nonanedioic acid (fig. S8), glycocholic acid (fig. S9), and 3-carboxy-4-methyl-5-propyl-2-furanpropanoic acid (CMPF) (fig. S10). Additionally, two MFs that provided strong matches to MS/MS spectra in the Metlin databases were putatively identified as arachidonoyl lysophosphatidic acid [Lyso PA (20:4)] (fig. S11) and 3-ketosphingosine (fig. S12).

Metabolic dissimilarity between Lyme disease and STARI, as related to healthy controls

We hypothesized that early Lyme disease and STARI represent distinct metabolic states that would be reflected in a comparison of MFs’ relative abundances in these two disease states to those of healthy controls. This provided evidence for metabolic separation of early Lyme disease and STARI patient samples (Fig. 4A). For three MFs (3-ketosphingosine, CMPF, and Lyso PA 20:4), their abundances in early Lyme disease were increased as compared to the healthy controls, whereas their abundances in STARI were decreased. Additionally, all of the NAEs and PFAMs had relative abundances in early Lyme disease patients that were closer to those of healthy controls, whereas the relative abundances in STARI were greatly increased.

Fig. 4. Comparison of MF abundances from the Lyme disease–STARI biosignature against healthy controls.

(A) Fourteen of the metabolites with level 1 or level 2 structural identifications were evaluated for abundance differences between early Lyme disease (green squares) and STARI (blue triangles) normalized to the metabolite abundance in healthy controls. Metabolites identified for NAE and PFAM metabolism are included. GP-NPEA, glycerophospho-N-palmitoyl ethanolamine. The relative mean abundance and 95% confidence intervals are shown for each metabolite. (B) Abundance fold change ranges (x axis) plotted against the percent of MFs from the 261-MF early Lyme disease–STARI biosignature that have increased (dark blue) or decreased (light blue) abundances in STARI relative to healthy controls and increased (dark green) or decreased (light green) abundances in early Lyme disease relative to healthy controls. (C) Percentage of identical MFs in STARI and early Lyme disease that had the same directional and similar abundance fold change difference relative to healthy controls (y axis). MFs were grouped on the basis of abundance fold change ranges: 1.0 to 1.4, 1.5 to 1.9, 2.0 to 2.4, 2.5 to 2.9, 3.0 to 3.4, and ≥3.5 (x axis). MFs with increased fold changes relative to healthy controls are indicated in dark purple, and those with decreased fold changes are indicated in light purple.

This analysis was expanded to all 261 MFs of the early Lyme disease–STARI biosignature. The total number of MFs with increased and decreased abundances relative to healthy controls was similar for both early Lyme disease and STARI within the different abundance fold change ranges (Fig. 4B). However, when individual MFs with increased or decreased abundances relative to healthy controls were compared between early Lyme disease and STARI, a strong difference emerged, with only 0 to 30% of MFs shared between early Lyme disease and STARI yielding the same directional and similar abundance fold change (Fig. 4C). This indicated that the metabolic changes in early Lyme disease and STARI as compared to healthy controls differed.

Diagnostic classification of early Lyme disease versus STARI

Classification models were used to determine whether the 261-MF biosignature could be applied to discriminate early Lyme disease from STARI (Table 1). Specifically, two classification models, LASSO and RF, were trained with the 261-MF biosignature using abundance data from the Training-Set samples only. Test-Set samples were not used for MF selection or to train the classification models. The LASSO model selected 38 MFs, and RF by default does not perform feature selection and thus used all 261 MFs for classification of the STARI and early Lyme disease patient populations. Table S4 provides the regression coefficients for the 38 MFs selected by LASSO. When Test-Set samples were tested in duplicate, early Lyme disease samples (n = 30) were classified by RF and LASSO with an accuracy of 97 and 98%, respectively. The STARI samples (n = 19) had a classification accuracy of 89% with both models (Table 1 and table S5). A depiction of the LASSO scores for the Test-Set data showed segregation of the early Lyme disease and STARI patient samples and demonstrated the discriminating power of the 38 MFs selected by the LASSO model (Fig. 5A). A receiver operating characteristic (ROC) curve was plotted to demonstrate the performance of the LASSO model for differentiating early Lyme disease from STARI patients. The area under the curve (AUC) was calculated to be 0.986 (Fig. 5B). The 38 MFs of the LASSO model encompassed 4 of the 14 structurally confirmed metabolites: CMPF, l-phenylalanine, palmitoyl ethanolamide, and Lyso PA.

Table 1. Classification modeling using the 261-MF biosignature list.
View this table:
Fig. 5. Evaluation of classification models’ performance.

(A) LASSO scores (Xβ, that is, the linear portion of the regression model) were calculated for Test-Set data of early Lyme disease (green dots) and STARI (blue triangles) serum samples by multiplying the transformed abundances of the 38 MFs identified in the two-way LASSO model by the LASSO coefficients of the model and summing for each sample. Scores are plotted along the y axis; serum samples are plotted randomly along the x axis for easier viewing. (B) An ROC curve demonstrates the level of discrimination that is achieved between early Lyme disease and STARI using the 38 MFs of the two-way LASSO classification model by depicting a true-positive rate (sensitivity; early Lyme disease) versus a false-positive rate (specificity; STARI) for the Test-Set samples (table S6). The AUC was calculated to be 0.986. The diagonal line represents an AUC value of 0.5. The performance of two-tiered testing (red dot) on the same sample set (Test-Set 1) was included as a reference for the sensitivity and specificity of the current clinical laboratory test for Lyme disease. (C) LASSO scores (Xβi) were calculated for the Test-Set data of early Lyme disease (green spheres), STARI (blue spheres), and healthy control (black spheres) serum samples by multiplying the transformed abundances of the 82 MFs identified in the three-way LASSO model by each of three LASSO coefficients used in the model. Each axis represents the sample score in the direction of one of the three sample groups. Scores are used in calculation of probabilities of class membership, with highest probability determining the predicted class.

Diagnostic classification of early Lyme disease versus STARI versus healthy controls

Separate three-way classification models using LASSO and RF were developed by including LC-MS data collected for healthy controls (n = 38) in the Training-Set samples. For model training, LASSO selected 82 MFs (table S2). The regression coefficients for the 82 MFs selected by LASSO are provided in table S6. Evaluation of the RF and LASSO three-way classification models with Test-Set samples revealed classification accuracies of 85 and 92% for early Lyme disease and STARI, respectively. Healthy controls (n = 20), a group not included for biosignature development, were classified with accuracies of 95 and 93% with the RF and LASSO models, respectively (Table 1 and table S7). Plotting of LASSO scores calculated for Test-Set data revealed three groupings that corresponded with early Lyme disease, STARI, and healthy controls (Fig. 5C). Of the early Lyme disease sample data files that were misclassified with the RF model (n = 9), all were predicted to be healthy controls; of those misclassified by the LASSO model (n = 9), three were classified as STARI and six as healthy controls. Of the STARI samples that were misclassified by the RF and LASSO models (n = 3 for both models), all samples were misclassified as early Lyme disease. When healthy controls were misclassified using the RF model (n = 2) and LASSO model (n = 3), all were misclassified as early Lyme disease.

Of the 38 MFs selected by LASSO for the two-way classification model, 33 were included in the 82 MFs of the LASSO three-way classification model (table S2). The 82 MFs of the LASSO three-way classification included 7 of the 14 structurally confirmed metabolites: 3-ketosphingosine, glycocholic acid, and pentadecanoyl ethanolamide, as well as the 4 included in the LASSO two-way classification model (table S2).

Assessment of geographic variability

Because retrospective samples collected by multiple laboratories were used in these studies, we assessed whether a geographic bias was introduced. Pairwise comparisons, by sample source, of mean LASSO scores and mean RF probabilities of STARI and healthy control Test-Set data were performed with analysis of variance (ANOVA), followed by computation of simultaneous confidence intervals using Tukey’s method (tables S8 and S9). These results demonstrated that all STARI samples grouped together regardless of source and differ from healthy controls. Likewise, the healthy controls from Colorado and New York were considered a single group that differed from STARI based on the LASSO healthy control or STARI scores, and the RF classification probabilities for healthy controls. The only outlier noted in these analyses was the New York healthy controls, but this was only when the ANOVA was performed with RF classification probabilities for STARI.

The grouping of the samples based on disease state and not geographic distribution was also evaluated by linear discriminant analysis using the 82 MFs of the LASSO three-way classification model (Fig. 6). For this analysis, healthy controls from Florida, a region with low prevalence for Lyme disease and reported to have STARI cases, were included to evaluate whether samples collected in the southern United States would differ from those collected in New York or Colorado. For STARI, three patient sample groups collected in Missouri, North Carolina, and other states (including Virginia, Georgia, Kentucky, Tennessee, Alabama, Iowa, and Nebraska) were compared. The linear discriminant analysis demonstrated that although slight variation exists between the three healthy control groups (New York, Colorado, and Florida), there is greater variability between all healthy controls and all STARI samples than within healthy controls or STARI samples based on geographic location of collection (Fig. 6).

Fig. 6. Evaluation of intra- and intergroup variability.

Linear discriminant analysis was performed using the 82 MFs picked by LASSO in the three-way classification model to assess the intragroup variability based on the geographical region or laboratory from which healthy control (CO, blue, solid; FL, green, dotted; and NY, red, dashed) and STARI (MO, dark blue, solid; NC, light blue, dotted; and Other, green, dashed) sera were obtained.


The inability to detect B. burgdorferi by PCR or culture, no serological response to B. burgdorferi antigens in STARI patients, and transmission by different tick species are accepted as evidence that the etiologies of STARI and Lyme disease differ (7, 8, 16, 25). Nevertheless, an overlap in clinical symptoms, including the development of an EM-like skin lesion, creates confusion and controversy for the clinical differentiation of STARI and Lyme disease (32). The data reported here demonstrated marked differences between the metabolic profiles of early Lyme disease and STARI patients, and thus provide compelling positive data to support the concept that these two illnesses are distinct entities. Metabolic pathway analyses and the structural identification of several MFs with significant (P < 0.05) abundance differences between early Lyme disease and STARI identified multiple NAEs. These endogenous lipid mediators are derived from phosphatidylcholine and phosphatidylethanolamine via the endocannabinoid system (31). AEA, a widely studied endocannabinoid, is an endogenous agonist of the cannabinoid receptors; however, it is a minor component of animal tissues. In contrast, congeners of AEA, such as the NAEs identified in the early Lyme disease–STARI biosignature, are significant products of animal tissues, including the skin (31, 33). The serum levels of NAEs having long-chain saturated fatty acids were significantly (P < 0.05) increased in the serum of STARI patients. These are produced in response to inflammation and act in an anti-inflammatory manner as agonists of PPAR-α or by enhancing AEA activity (34, 35). The NAEs can be converted to N-acylglycine structures via an alcohol dehydrogenase and further degraded to PFAMs (36). Our data demonstrated a STARI-associated increase in PFAMs and the corresponding NAEs with saturated fatty acids. Our studies did not address the underlying mechanism for the increased NAE and PFAM levels in STARI patients; however, a decrease in fatty acid amide hydrolase activity that liberates fatty acids from both NAEs and PFAMs (37) is one potential cause for increase in abundance of these metabolites. The anti-inflammatory activity of the NAEs also raises the possibility that these metabolites are partially responsible for the milder symptoms associated with STARI (9). Because the enzymes involved with the genesis and degradation of NAEs and PFAMs are known (31, 38), studies can be constructed to elucidate the mechanism(s) by which NAEs and PFAMs accumulate in the sera of STARI patients.

We previously demonstrated proof of concept for a discriminating metabolic biosignature of early Lyme disease (29). This current work expands on the utility of this approach by demonstrating the ability to distinguish early Lyme disease from an illness with nearly identical symptoms or what would be considered a Lyme disease–like illness (39). The existing diagnostic algorithm for Lyme disease uses an EIA or immunofluorescence assay as a first-tier test followed by IgM and IgG immunoblotting as a second-tier test (40). For early Lyme disease, the sensitivity of this diagnostic algorithm is 29 to 40%, and the specificity is 95 to 100% (41) and does not distinguish between active and previous infections, an important limitation. Here, all of the STARI samples were negative by two-tiered testing, and only 2% were positive by the first-tier EIA. Early Lyme disease samples were 44% positive by two-tiered testing. In contrast, classification modeling with the 261 MFs of the early Lyme disease–STARI biosignature markedly increased diagnostic accuracy for early Lyme disease (85 to 98% accuracy depending on the model). Classification by RF or LASSO was overall highly accurate for early Lyme disease and STARI, in particular when using the two-way classification models. When healthy controls were introduced and used to develop a three-way classification model, there was a slight increase in the accuracy for STARI and decrease in the accuracy for early Lyme disease, but healthy controls were classified with a 93 to 95% accuracy. This also supports the conclusion that STARI and early Lyme disease are metabolically distinct from healthy controls, but in different ways.

To date, the development of a diagnostic tool for STARI or differentiation of early Lyme disease and STARI has received little attention. As the geographic distribution of Lyme disease continues to expand (25, 26), so will the geographic range where there is overlap of Lyme disease and STARI. Thus, a diagnostic tool that accurately differentiates these two diseases could have a major impact on patient management. Lyme disease is treated with antibiotics, and although there is no defined infectious etiology for STARI, this illness is also commonly treated in a similar manner (7, 20, 42). Establishment of a robust diagnostic tool would not only facilitate antibiotic stewardship but also allow for proper studies to assess the true impact of therapies for STARI. Lyme disease is also a reportable disease, and to maintain accurate disease surveillance in low-incidence areas, it is essential that diseases such as STARI be excluded (32). Additionally, vaccines are currently being developed for Lyme disease (4345), and as these are tested, it will be important to distinguish STARI patients to properly assess vaccine efficacy.

To apply the discoveries of this work toward the development of an assay that can be used for the clinical differentiation of early Lyme disease and STARI, it must first be determined whether an emphasis should be placed on the diagnosis of Lyme disease or STARI. Because there is no defined etiology of STARI, and Lyme disease is not necessarily self-limiting without antibiotics and can have subsequent complications if untreated, we envision that the final assay would focus on being highly sensitive for early Lyme disease and be primarily applied in regions where Lyme disease and STARI overlap. Although existing laboratory tests for Lyme disease emphasize specificity, this strategy needs to be reconsidered for a differential diagnostic test of STARI and early Lyme disease, because any illness presenting with an EM in a region with a known incidence of Lyme disease would likely be treated with antibiotics (7, 20, 42). As with all diagnostic tests, use of a metabolic biosignature for differentiation of early Lyme disease and STARI would need to be performed in conjunction with clinical evaluation of the patient, and consideration of their medical history and epidemiologic risk for these two diseases.

The approach outlined in this study applies biochemical signatures derived by semiquantitative MS for the classification of patients. A limitation of the current study is that the most accurate quantification of metabolites by MS is achieved by multiple reaction monitoring (MRM) assays (46). MS assays currently used in clinical laboratories for the analyses of small-molecule metabolites typically apply MRM assays. Most of these tests are under Clinical Laboratory Improvement Amendments guidelines, but a U.S. Food and Drug Administration–cleared MS-based test for inborn metabolic errors is in use (47). MRM assays are developed with the knowledge of an MF’s chemical structure. To this end, we have identified the chemical structure of 14 MFs and are continuing to identify those that have the greatest fold change difference and reproducibility. It is noted that the NAEs and PFAMs revealed via our pathway analyses are amenable to MRM assays (48). These metabolites are now being investigated for their ability to accurately classify STARI and early Lyme disease. Any MS-based diagnostic assay for Lyme disease and STARI would likely occur in a specialized clinical diagnostic laboratory. At this point, it is difficult to accurately assess the cost benefits of an MS-based assay; however, it should be noted that the second-tier immunoblot assays for the serological diagnosis of Lyme disease are already performed in specialized laboratories (1, 49, 50). Further, an LC-MS assay for a single sample performed as described in this work takes about 1.5 hours from sample processing to data collection.

The data reported here were generated from the analysis of retrospectively collected serum samples from various repositories that have been archived for different lengths of time. To reduce the impact of the potential variability associated with these samples, we applied stringent criteria to the data analysis. In addition to the requirement of a significant (P < 0.05) fold change, those MFs selected for the final early Lyme disease–STARI biosignature were required to be present in at least 80% of samples within a sample group and maintain the median fold change difference in at least 50% of samples within a group. Whereas the STARI and healthy control sera were collected by multiple laboratories and from multiple geographic locations, the early Lyme disease sera were obtained from a single laboratory. This is a potential limitation of the study. However, linear discriminate analysis was applied to assess the variability within the healthy control and STARI samples collected by different laboratories. This analysis demonstrated little to no variability among the STARI or healthy control samples, indicating that the criteria used for MF selection effectively reduced nonbiological variability. As we have noted, data were collected by nonabsolute semiquantitative MS. Nevertheless, this is a common practice applied in the development of differentiating biosignatures for infectious diseases (27, 5154), and our workflow ensured that the most robust MFs were selected and used for classification modeling. This included the use of full technical replicates that controlled for potential variability introduced during sample processing and LC-MS analyses. Although the use of replicate samples could have biased the discriminating power of the classification models for the Training-Set data and the number of samples used was relatively small, the use of an independent hold-out Test-Set prevented this from being a major limitation for model testing.

Without knowledge of a known etiologic agent, we recognize that STARI simply encompasses a clinical syndrome. The STARI samples used in this current work included those collected in studies used to define this illness (9), as well as samples collected outside those original studies. The diagnostic accuracy achieved with these samples provides justification for prospective studies to an expanded number of early Lyme disease and STARI samples and to assess the applicability of the current metabolic biosignature for patients with other non-Lyme EM-like lesions, including tick bite hypersensitivity reactions, certain cutaneous fungal infections, and bacterial cellulitis. Additionally, other factors such as Lyme disease confections with other vector-borne pathogens should be addressed. In the southeastern United States, there is evidence for enzootic transmission of B. burgdorferi; however, it is debatable whether Lyme disease occurs in this region (11, 32, 55, 56). The current study was not designed to provide evidence for or against the presence of Lyme disease in the southern United States. Nevertheless, metabolic profiling offers a novel approach that is orthogonal to methods currently used to address this issue.


Study design

STARI is a confounding factor in diagnosing early Lyme disease in areas where both illnesses overlap and contributes to the debate surrounding the presence of Lyme disease in the southern United States. No diagnostic tool exists for STARI or for differentiating early Lyme disease from STARI. On the basis of documented differences between early Lyme disease and STARI (9, 16, 57), we hypothesized that metabolic profiling of serum would permit development of a biochemical biosignature that, when applied, could accurately classify early Lyme disease and STARI patients. For this reason, we designed an unbiased metabolomics study to directly compare the metabolic host responses between these two illnesses and to subsequently evaluate how this metabolic biosignature distinguishes these two illnesses. The use of unbiased metabolomics for biosignature discovery does not lend itself to power calculations to determine sample size. Thus, sample sizes were selected on the basis of our previous studies (29, 51, 52). To obtain a sufficient number of well-characterized STARI sera, retrospectively collected samples from two separate studies were used. Specifically, the first set of STARI serum samples (n = 33) was obtained from the Centers for Disease Control and Prevention (CDC) repository. These samples were collected through a prospective study performed between 2007 and 2009 (58). The states where patients were recruited included the following: North Carolina, 18; Virginia, 4; Tennessee, 3; Kentucky, 2; Georgia, 2; Iowa, 2; Alabama, 1; and Nebraska, 1. All samples were collected before treatment with the exception of one patient who was treated with doxycycline 1 to 2 days before the serum sample was obtained. The second set of STARI samples (n = 22) was obtained from the New York Medical College serum repository (20). These samples were collected between 2001 and 2004 from patients living in Missouri.

Sufficient numbers of well-characterized early Lyme disease serum samples were acquired from New York, an area of high incidence for Lyme disease and low incidence of STARI (9). Specifically, all early Lyme disease samples (n = 70) were culture- and/or PCR-positive for B. burgdorferi and were collected before treatment. To ensure appropriate representation of both nondisseminated and disseminated forms of early EM Lyme disease, samples from patients with a single EM that were skin culture– and/or PCR-positive for B. burgdorferi and blood culture–negative (n = 35) and patients with multiple EMs or a single EM that were blood culture–positive (n = 35) were used. Early Lyme disease samples were collected between 1992 and 2007, and 1 to 33 days after onset of symptoms. To understand the relationship of our findings to a healthy control population, serum samples from healthy donors were also included in the study. These were procured from repositories at New York Medical College, the CDC, and the University of Central Florida (UCF). A detailed description of inclusion and exclusion criteria for each patient and donor population is provided in table S1. All participating institutions obtained institutional review board (IRB) approval for this study. IRB review and approval for this study ensured that the retrospective samples used had been collected under informed consent.

All samples were analyzed in duplicate and were randomized before processing for LC-MS analyses. Healthy control sera were used as quality control samples for each LC-MS experiment. The serum samples and respective LC-MS data files of each patient group and healthy controls were randomly separated into a Discovery-Set/Training-Sets 1 and 2, and Test-Sets 1 and 2. Specifically, 40 of the 70 early Lyme disease and 36 of the 55 STARI samples were randomly selected as the Discovery-Set samples. This sample set was used for MF selection. To train the classification models, two training sets were used. The first, Training-Set 1, was identical to the Discovery-Set, and the second, Training-Set 2, consisted of the same samples as Training-Set 1 with the addition of 38 of the 58 healthy control samples from Colorado and New York. Last, Test-Sets 1 and 2 were created. Test-Set 1 was composed of 30 early Lyme disease and 19 STARI samples that were not included in the Discovery/Training sample sets. Test-Set 2 consisted of the same samples as those used in Test-Set 1 with the addition of 20 healthy control samples that were not included in the Training-Set 2 samples. Test-Sets 1 and 2 were exclusively used for blinded testing of the classification models.

Randomization into Discovery/Training-Sets or Test-Sets was done in a manner that ensured that bias was not introduced based on the repository from which STARI samples were obtained or on whether the early Lyme disease samples were from a nondisseminated or disseminated case. Biosignature development was performed by screening MFs based on stringent criteria outlined in Fig. 1A and detailed in the “Biosignature development” section.

Lyme disease serologic testing of all serum samples

Standard two-tiered testing was performed on all samples (40). The C6 B. burgdorferi (Lyme) ELISA (Immunetics) was used as a first-tier test, and any positive or equivocal samples were subjected to Marblot IgM and IgG immunoblots (MarDx Diagnostics) as the second-tier test. Serologic assays were performed according to the manufacturer’s instructions, and the data were interpreted according to established CDC guidelines (40). Duration of illness was not considered for test interpretation.

Liquid chromatography–mass spectrometry

Serum samples were randomized before extraction of small-molecule metabolites and LC-MS analyses. Small-molecule metabolites were extracted from sera as previously reported (29). An aliquot (10 μl) of the serum metabolite extract was applied to a Poroshell 120, EC-C8, 2.1 × 100 mm, 2.7 μm LC column (Agilent Technologies). The metabolites were eluted with a 2 to 98% nonlinear gradient of acetonitrile in 0.1% formic acid at a flow rate of 250 μl/min with an Agilent 1200 Series LC system. The eluent was introduced directly into an Agilent 6520 quadrupole–time-of-flight (Q-TOF) mass spectrometer, and MS was performed as previously described (29, 51). LC-MS and LC-MS/MS data were collected under the following parameters: gas temperature, 310°C; drying gas at 10 liters/min; nebulizer at 45 lb/in2; capillary voltage, 4000 V; fragmentation energy, 120 V; skimmer, 65 V; and octopole RF setting, 750 V. The positive-ion MS data for a mass range of 75 to 1700 Da were acquired at a rate of 2 scans/s. Data were collected in both centroid and profile modes in 4-GHz high-resolution mode. Positive-ion reference masses of 121.050873 and 922.009798 m/z were introduced to ensure mass accuracy. To monitor instrument performance, quality control samples consisting of a metabolite extract of healthy control serum (BioreclamationIVT) were analyzed in duplicate at the beginning of each analysis day and every 20 samples during the analysis day.

Biosignature development

LC-MS data from an initial Discovery-Set of samples composed of randomly selected early Lyme disease (n = 40) and randomly selected STARI patients (n = 36) that were exclusively used for MF selection and classification model training were processed with the Molecular Feature Extractor algorithm tool of the Agilent MassHunter Qualitative Analysis software version B.05.00 (Agilent Technologies). The MFs were aligned between data files with a 0.25-min RT window and 15-ppm mass tolerance. Comparative analyses of differentiating MFs between patient groups were performed using the workflow presented in Fig. 1A. Specifically, the Discovery-Set data were analyzed using MPP software version B.12.05 (Agilent Technologies). Using MPP, a univariate, unpaired t test was performed on each metabolite to test for a difference in mean (standardized) abundance between early Lyme disease and STARI groups. Multiple testing was accounted for by computing false discovery rate–adjusted P values (59). To prevent selection of MFs biased by uncontrolled variables (diet, other undisclosed illnesses, etc.), only MFs present in 50% or more of samples in at least one group and that differed between the groups with a significance of adjusted P < 0.05 were selected. Quantitative Analysis software version B.05.01 (Agilent Technologies) was used to extract area abundance values for all differentially selected MFs from the MS data files. Duplicate MFs were removed by assessing adduct ions, as well as mass, RT, and abundance similarities; this resulted in the Discovery MF List. A duplicate LC-MS analysis of the Discovery-Set samples was performed, and the area abundance for MFs of the discovery MF List was extracted using the Quantitative Analysis software. These data with those from the first LC-MS analysis formed the Targeted-Discovery-Set.

Abundance data from the Targeted-Discovery-Set data files were normalized using a two-step method. First, abundances (area under the peak for the monoisotopic mass) of each Discovery MF were normalized by the median intensity of the stable MFs detected in each individual sample (60). Stable MFs were those identified in the original extraction of LC-MS data files with the Agilent MassHunter Qualitative Analysis software and present in at least 50% of all sample data files. Second, median fold changes of stable MFs between the initial quality control sample (applied at the beginning of the LC-MS analysis) and each of the subsequent quality control samples (applied every 20 clinical samples throughout the LC-MS analysis) were calculated. The median fold change calculated for the quality control sample that directly followed each series of 20 clinical samples was multiplied against the normalized Discovery MF abundances in the clinical samples of that series. This second normalization step was performed to correct for instrument variability. To apply stringency to the development of a final early Lyme disease–STARI biosignature, MFs were filtered on the basis of consistency in the duplicate LC-MS data sets by requiring the same directional abundance change between the patient groups. Specifically, MFs with at least a ≥2-fold abundance difference and a 1.5-fold abundance difference between the medians of the two groups (early Lyme disease and STARI) for LC-MS analysis-1 and LC-MS analysis-2, respectively, were selected. Further criteria applied to ensure that the most robust MFs were being selected included removing MFs with >20% missing values in both groups and selecting only MFs where at least 50% of the samples within a patient group produced a fold change of ≥2 in comparison to the mean of the other patient group. This selection process resulted in the MFs included in the early Lyme disease–STARI biosignature.

Prediction and verification of MF chemical structure

Confirmation of the chemical structures of selected MFs was performed by LC-MS/MS to provide level 1 or level 2 identifications (61). Commercial standards palmitoyl ethanolamide, stearoyl ethanolamide, eicosanoyl ethanolamide, glycerophospho-N-palmitoyl ethanolamine, pentadecanoyl ethanolamide, and erucamide were obtained from Cayman Chemical. Commercial standards piperine and nonanedioic acid were obtained from Sigma-Aldrich. Commercial standards methyl oleate, stearamide, palmitamide, CMPF, and glycocholic acid were obtained from Santa Cruz Biotechnology Inc. The LC conditions used were the same as those used for the LC-MS analyses of serum metabolites. MS/MS spectra of the targeted MFs and commercial standards were obtained with an Agilent 6520 Q-TOF mass spectrometer. Electrospray ionization was performed in the positive-ion mode as described for MS analyses, except that the mass spectrometer was operated in the 2-GHz extended dynamic range mode. The positive ion MS/MS data (50 to 1700 Da) were acquired at a rate of 1 scan/s. Precursor ions were selected by the quadrupole and fragmented via collision-induced dissociation with nitrogen at a collision energy of 10, 20, or 40 eV. To provide a level 1 identification, the MS/MS spectra of the targeted metabolites were compared to spectra of commercial standards. Additionally, LC RT comparisons between the targeted MF and the respective standard were made. An RT window of ±5 s was applied as a cutoff for identification. The MS/MS spectra of selected serum metabolites were compared to spectra in the Metlin database for a level 2 identification.

Metabolic pathway analysis by MetaboAnalyst

The experimentally obtained monoisotopic masses corresponding to the 261 MFs of the biosignature list were searched against HMDB using a 15-ppm window. The resulting list of potential metabolite structures was applied to the MetaboAnalyst pathway analysis tool (30). Settings for pathway analysis consisted of applying Homo sapiens pathway library, the hypergeometric test for the overrepresentation analysis, and relative betweenness centrality to estimate node importance in the pathway topology.

Statistical analyses

Methods to filter the list of MFs and to normalize abundances are described in the “Biosignature development” section. Before analysis, the normalized abundances were log2-transformed, and each MF was scaled to have a mean of 0 and an SD of 1. Statistical analyses were performed using R software (62).

Classification modeling. For classification modeling, Training-Set and Test-Set samples were used as previously described (29, 51) and as shown in Fig. 1B. Separate classification analyses were performed for comparison of two groups (early Lyme disease and STARI) and three groups (early Lyme disease, STARI, and healthy controls). For each scenario, two classification approaches were applied: RF using the randomForest package (63), with 16 features randomly selected for each clade and a total of 500 trees, and LASSO logistic (two-way) and multinomial (three-way) regression analysis using the glmnet package (64), with the tuning parameter chosen for minimum misclassification error over a 10-fold cross-validation. The ROC curve and AUC were generated for predicted responses on the Test-Set samples only using the pROC package (65). For the purpose of visualization, LASSO scores for individual patient samples were calculated by multiplying the respective regression coefficients (tables S4 and S6) resulting from LASSO analysis by the transformed abundance of each MF in the biosignature (38 MFs in the case of two-way classification and 82 MFs in the case of three-way classification) and summing for each sample. The rgl package was used to generate the three-dimensional scatterplot of LASSO scores (66).

Linear discriminant analysis. A linear discriminant analysis was performed with the 82 MFs selected by the three-way LASSO model using linear discriminant analysis function in R. MF abundance data included in the linear discriminant analysis were from healthy controls from Colorado, Florida, and New York and from STARI patients from North Carolina, Missouri, and other states. Before linear discriminant analysis, data were transformed by taking the log2 value and standardizing to the mean 0 and variance 1 within each MF. Samples were differentiated by healthy controls and STARI.


Fig. S1. Level 1 identification of stearoyl ethanolamide.

Fig. S2. Level 1 identification of pentadecanoyl ethanolamide.

Fig. S3. Level 1 identification of eicosanoyl ethanolamide.

Fig. S4. Level 1 identification of glycerophospho-N-palmitoyl ethanolamine.

Fig. S5. Level 1 identification of stearamide.

Fig. S6. Level 1 identification of erucamide.

Fig. S7. Level 1 identification of l-phenylalanine.

Fig. S8. Level 1 identification of nonanedioic acid.

Fig. S9. Level 1 identification of glycocholic acid.

Fig. S10. Level 1 identification of CMPF.

Fig. S11. Level 2 identification of Lyso PA (20:4) by MS/MS spectral matching.

Fig. S12. Level 2 identification of 3-ketosphingosine by MS/MS.

Table S1. Serum samples used in this study.

Table S2. 261-MF biosignature list.

Table S3. MetaboAnalyst results.

Table S4. Regression coefficients (β) of the LASSO two-way statistical model.

Table S5. LASSO and RF two-way model classification probability scores.

Table S6. Regression coefficients (β) of the LASSO three-way statistical model.

Table S7. LASSO and RF three-way model classification probability scores.

Table S8. ANOVA results on LASSO and RF scores with sample source as the explanatory variable.

Table S9. Grouping indicated by the ANOVA results.


Acknowledgments: We thank T. Jewett and M. Halpern for assistance in the collection of healthy control sera at UCF, and D. Garcia at the Colorado State University (CSU) for graphics. Control sera from Florida were made available for the studies performed under a material transfer agreement between the UCF and CDC. Funding: This work was supported by the National Institute of Allergy and Infectious Diseases, NIH grants R21/R33 AI100228 (to J.T.B. and G.P.W.) and R01 AI099094 (to M.W.J.), the UCF College of Medicine (to M.W.J.), and the CDC. Author contributions: C.R.M., J.T.B., G.P.W., B.J.J., L.V.A., and M.N.I. designed the experiments. A.M.H., M.J.D., and B.G.A. performed statistical analyses that included linear discriminant analyses, ROC generation, and statistical modeling. L.V.A., K.W., and A.P.-J. performed the metabolomics analyses and data processing. M.A.P., B.J.J., G.P.W., I.M., and M.W.J. helped with serum sample collections, clinical data extraction, and compilation of patient data. Competing interests: G.P.W. reports receiving research grants from Immunetics Inc., Institute for Systems Biology, RareCyte Inc., and Quidel Corporation. He owns equity in Abbott, has been an expert witness in malpractice cases involving Lyme disease, and is an unpaid board member of the American Lyme Disease Foundation. J.T.B., C.R.M., and G.P.W. are inventors on a provisional patent application #US 62/516,824 submitted by CSU that covers the use of a metabolic profiling and specific metabolites as a diagnostic method to objectively differentiate early Lyme disease from STARI and healthy controls. All other authors declare that they have no competing interests.

Stay Connected to Science Translational Medicine

Navigate This Article