ReportCORONAVIRUS

Using influenza surveillance networks to estimate state-specific prevalence of SARS-CoV-2 in the United States

See allHide authors and affiliations

Science Translational Medicine  29 Jul 2020:
Vol. 12, Issue 554, eabc1126
DOI: 10.1126/scitranslmed.abc1126

Inferring infections

The prevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in many countries is likely underestimated because of limited or inaccurate testing and undetected asymptomatic cases. Silverman et al. used data collected through an existing infrastructure for reporting influenza-like illness to estimate the actual prevalence of SARS-CoV-2 infections in US states. They used a statistical model to estimate the proportion of observed influenza-like illness during the early pandemic that was in excess of the seasonal variation seen in prior years, then adjusted this estimate to take into account subclinical infections. Their model estimated that more than 80% of individuals with SARS-CoV-2 infections in the US went undetected in March 2020.

Abstract

Detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections to date has relied heavily on reverse transcription polymerase chain reaction testing. However, limited test availability, high false-negative rates, and the existence of asymptomatic or subclinical infections have resulted in an undercounting of the true prevalence of SARS-CoV-2. Here, we show how influenza-like illness (ILI) outpatient surveillance data can be used to estimate the prevalence of SARS-CoV-2. We found a surge of non-influenza ILI above the seasonal average in March 2020 and showed that this surge correlated with coronavirus disease 2019 (COVID-19) case counts across states. If one-third of patients infected with SARS-CoV-2 in the United States sought care, this ILI surge would have corresponded to more than 8.7 million new SARS-CoV-2 infections across the United States during the 3-week period from 8 to 28 March 2020. Combining excess ILI counts with the date of onset of community transmission in the United States, we also show that the early epidemic in the United States was unlikely to have been doubling slower than every 4 days. Together, these results suggest a conceptual model for the COVID-19 epidemic in the United States characterized by rapid spread across the United States with more than 80% infected individuals remaining undetected. We emphasize the importance of testing these findings with seroprevalence data and discuss the broader potential to use syndromic surveillance for early detection and understanding of emerging infectious diseases.

INTRODUCTION

The ongoing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic continues to cause substantial morbidity and mortality around the world (1, 2). Regional preparation for the pandemic requires estimating the growth rate of the epidemic, the timing of the epidemic peak, the demand for hospital resources, and the degree to which current policies may curtail the epidemic, all of which benefit from accurate estimates of the true prevalence of the virus within a population (3). Confirmed cases are thought to be underestimates of true prevalence due to some unknown combination of patients not reporting for testing, testing not being conducted, and false-negative test results. Estimating the true prevalence of SARS-CoV-2 would inform the scale of upcoming surges in hospital demand, the proportion of individuals who remain susceptible to contracting the disease, and estimates of key epidemiological parameters such as the epidemic growth rate and the fraction of infections that are subclinical.

The current literature suggests that the predominant symptoms associated with COVID-19 are fever, cough, and sore throat; that is, patients often present with an influenza-like illness (ILI) yet test negative for influenza (4, 5). As coronavirus disease 2019 (COVID-19) often presents with similar symptoms to influenza, existing surveillance networks in place for tracking influenza could be used to help track COVID-19. Outpatient ILI surveillance has proven to be a useful tool for assessing the impact of influenza (6, 7). When combined with the number of providers and patients in a given region, ILI surveillance allows estimation of influenza prevalence and severity (814). Studies of outpatient ILI have repeatedly demonstrated that confirmed influenza case rates underestimate disease burden, likely due to preferential testing of more severe cases (8, 9, 13, 14). Together, these features suggest that ILI surveillance could provide a crucial tool for estimating COVID-19 prevalence within the United States.

Here, we quantified the baseline prevalence of non-influenza ILI in the United States over the past 10 years and identified a recent surge of non-influenza ILI starting the first week of March 2020. This surge of excess ILI correlated with known patterns of SARS-CoV-2 spread across states within the United States yet was orders of magnitude larger than the number of confirmed COVID-19 cases reported by the end of March.

RESULTS

ILI surge

We identified excess ILI cases by first subtracting cases due to influenza and then subtracting the seasonal signal of non-influenza ILI (Fig. 1). Our approach identified known outbreaks of respiratory disease, including the recent outbreak of respiratory syncytial virus that occurred in Washington state in December 2019 (15). Starting in March 2020, many states, including Washington, New York, Oregon, Pennsylvania, Maryland, Colorado, New Jersey, and Louisiana, showed a surge in number of non-influenza ILI cases in excess of seasonal norms. For example, in the fourth week of March 2020, New York State saw approximately two times higher non-influenza ILI than it had ever seen since the inception of the ILINet surveillance system within the United States. We found that 10.2% of all outpatient visits in New York State during this time were for ILI that could not be explained by either influenza or the normal seasonal variation of respiratory pathogens (8.0 to 11.2% credible set). As the seasonal surge of endemic non-influenza respiratory pathogens declined toward the later weeks in March, this excess ILI correlated more strongly with state-level patterns of newly confirmed COVID-19 cases, suggesting that this surge is a reflection of ILI due to SARS-CoV-2 (Pearson ρ > 0.35 and P < 0.05 for the last 3 weeks; fig. S2). The U.S.-wide ILI surge appeared to peak during the week starting on March 15 and subsequently decreased in numerous states the following week; notable exceptions are New York and New Jersey, two of the states that were the hardest hit by the epidemic, which had not started a decline by the week ending March 28.

Fig. 1 An early surge of ILI visits across the United States.

The proportion of patients presenting with ILI that could not be explained by influenza or typical seasonal variation (that is, excess ILI) is shown for four states (blue line and ribbons represent the posterior median as well as 95 and 50% credible sets; results from all analyzed states are shown in fig. S1). ILI that could not be attributed to influenza was calculated on the basis of influenza laboratory surveillance data (2019–2020 flu season is shown in red, and prior seasons are shown in black). A time-series model was used to infer seasonal variation of non-influenza ILI. Excess ILI was then calculated as the difference between non-influenza ILI from 2019 to 2020 and the seasonal baseline of non-influenza ILI. Excess ILI after 7 March is highlighted in darker blue as these data correlated strongly with observed COVID-19 case counts (fig. S2).

Changes in care-seeking behavior

We estimated the ILI surge in each U.S. state as an increase in the proportion of outpatients with ILI in that state compared to all outpatient visits in that state. Consequently, changes in the care-seeking behavior of individuals with ILI or of non-ILI during this time period could each affect our estimates of disease prevalence during the ILI surge. If patients with mild ILI were more likely to seek medical care during the month of March 2020 than in prior years, then our estimates of COVID-19 prevalence based on the ILI surge would be falsely elevated. In addition, if non-ILI patients were less likely to seek medical care during the month of March compared to prior years, then this too could falsely elevate our estimates of COVID-19 prevalence based on the ILI surge. Although ILINet does not provide information to ascertain care-seeking behavior, we were able to obtain syndromic surveillance data from New York City’s emergency departments, which provided up-to-date information on care-seeking behavior of both ILI and non-ILI conditions (16, 17).

If the ILI surge reflected higher rates of detection of typically mild ILI, then we would expect emergency department ILI rates to increase and the proportion of those ILI cases admitted to the hospital to decrease. However, although the daily number of ILI visits to emergency departments across New York City increased in March 2020, the proportion of those patients who went on to be admitted also increased by as much as threefold compared to the baseline rate before March (fig. S3A). This observation suggests that patients with mild ILI presented less often to hospital emergency departments. Such a decrease in care-seeking behavior for mild ILI, if similar across the United States, could deflate the estimated size of the ILI surge in the later weeks of March by a factor of approximately 3.

If non-ILI patients were less likely to seek medical care, then we would expect that the number of patients complaining of other symptoms not typically associated with COVID-19 (for example, vomiting) would also decrease compared to prior years. In the month of March, the daily number of patients presenting with vomiting decreased by as much as a factor of 3 compared to the baseline rate in prior years (fig. S3B). Assuming that all non-ILI conditions were similarly decreased during March, this would suggest that our estimates of the ILI surge could be inflated by as much as a factor of 3. This assumption is conservative as it assumes that even individuals with severe conditions (such as severe trauma) would avoid seeking health care in response to COVID-19 at the same rate as those with more mild conditions such as vomiting. However, the potential threefold decreased care-seeking behavior for non-ILI conditions cancels out the potential threefold decreased care-seeking behavior of mild ILI, suggesting that our estimates of prevalence based on the ILI surge may be insensitive to recent changes in care-seeking behavior (fig. S3C). Overall, these estimates suggest a conceptual model in which health care utilization for both mild ILI and non-ILI conditions declined at similar rates as COVID-19 increased in the United States.

COVID-19 prevalence in the United States

To estimate the proportion and magnitude of the March 2020 U.S. ILI surge attributable to SARS-CoV-2 infections, we made the following three assumptions: (i) that the patient population reported by sentinel providers is representative of their state each week, (ii) that changes in care-seeking behavior of ILI patients is occurring at a similar rate as that of other non-ILI patients, and (iii) that the total number of patients in the United States who require medical care over the course of a year has not substantially changed since 2018. Our first assumption is common and underlies prior studies that have used ILI to estimate influenza prevalence (8, 14). Our second assumption is supported by our New York City analysis, which suggests that both mild ILI and non-ILI conditions have seen similar changes in health care–seeking behavior. Our third assumption is based on the observation that the increasing need for health care between 8 and 28 March 2020 due to COVID-19 is likely small compared to the approximately 1 billion outpatient encounters that occur annually (18, 19). These assumptions, together with surveys describing the average number of patients seen by providers (19), the number of providers in each state (20), and the total number of outpatient visits per year (18, 21), allowed us to estimate that if outpatient clinics remained open during the COVID-19 epidemic, we would expect that there would have been approximately 2.8 million patient encounters with ILI due to COVID-19 between 8 and 28 March 2020 (95% credible set, 2.6 million to 3.0 million).

Not all patients infected with SARS-CoV-2 will present to a health care provider with ILI. Although we cannot directly measure the rate of such subclinical cases, a number of prior studies on asymptomatic rates of COVID-19 and the care-seeking behavior of ILI patients in the United States suggest a lower bound on the subclinical rate of patients with ILI. A recent study of passengers on the Diamond Princess cruise ship accounted for a right censoring of patients sampled and estimated that 18% of patients infected with SARS-CoV-2 are asymptomatic for the course of their infection (95% credible set, 16 to 20%). This estimate likely represents an underestimate, given that the majority of passengers were more than 60 years old, a demographic thought to have a lower asymptomatic rate than younger individuals (22). Beyond asymptomatic individuals, a large study of adult health care–seeking behavior in the United States found that, of a random sample of more than 17,000 individuals with ILI, 40% of those went on to seek health care (23). Together, these additional contributions from subclinical cases correspond to a mean clinical rate of 32% (the overall rate at which SARS-CoV-2 cases seek medical care) and a lower bound of 8.7 million SARS-CoV-2 infections between 8 and 28 March (95% credible set, 8.0 million to 9.4 million). Prevalence estimates for each state within this time period are shown in fig. S4.

Syndromic case detection rates

We define the syndromic case detection rate as the number of confirmed COVID-19 cases in a week divided by the size of the ILI surge that week. The syndromic case detection rate varied by state and over time (fig. S5). Our estimated syndromic case detection rates increased over the month of March; this was expected, given the increase in testing capacity across the United States since the 28 February detection of community transmission in Washington State. For the week ending 14 March, COVID-19 cases in the states with the highest estimated syndromic case detection rate (Washington, Nevada, and Michigan) only captured approximately 1% of ILI surges in those states. In the last week of the month ending 28 March, the syndromic case detection rate across the United States increased to 12.5% (95% credible interval, 9.5 to 18.3%).

Epidemic growth rates, clinical rates, and infection fatality rates implied by the ILI surge

The true prevalence of SARS-CoV-2 is unknown at the time of this writing. However, if we assume that the excess non-influenza ILI is almost entirely due to SARS-CoV-2, an assumption that becomes more valid as SARS-CoV-2 becomes more prevalent, we can use the excess non-influenza ILI to define lower bounds on the exponential growth rate of the U.S. SARS-CoV-2 epidemic. By estimating the number of patients visiting clinics for COVID-19 in the United States in March, we can also identify the mutual dependence of exponential growth rates, the rate of subclinical infections, and the time between the onset of infectiousness and a patient reporting as ILI (Fig. 2). Using stochastic susceptible, exposed, infectious, and recovered (SEIR) simulations of U.S. COVID-19 epidemics with a 15 January start date (24), we find that an initial epidemic doubling time longer than 4 days is unlikely to explain the ILI surge. Doubling times longer than 4 days fail to produce enough infected individuals to match the observed excess ILI. Doubling time faster than 4 days can explain the observed excess ILI with a clinical rate that depends on the growth rate. Here, we define the clinical rate as the proportion of infected individuals who present to a health care provider.

Fig. 2 The ILI surge imposes a dependence between growth rate and clinical rate in epidemiological models.

(A and B) SARS-CoV-2 prevalence estimates based on the ILI surge are consistent with an epidemiological model parameterized based on a 15 January epidemic start date and a doubling time equal to that observed for new deaths within the United States (A) or Italy (B). Epidemiological models were either stochastic (simulated via tau leaping) or deterministic (solved by numerical integration). In addition to our raw estimates of the ILI surge size (unadjusted), we provide adjusted prevalence estimates accounting for subclinical cases by assuming an 18% asymptomatic rate and a 40% rate of health care seeking of symptomatic ILI patients (adjusted). Epidemic trajectories were simulated using an SEIR model (black lines). The increasing gap between ILI prevalence estimates and SEIR trajectories (orange) suggests the presence of additional factors such as social distancing, changes in care-seeking behavior, or heterogeneity in susceptibility or transmission. (C) More generally, the size of the clinical population estimated from ILI data imposes a dependence between epidemic doubling time, the clinical rate, and the lag between onset of infectiousness and ILI reporting. Combinations of these three variables that are consistent (black) or inconsistent (gray) are shown, as well as a smoothed estimate of clinical rate as a function of doubling time.

In keeping with our sub–4-day doubling times, we found that, across the entire United States, new deaths due to COVID-19 doubled every 3.01 days over the month of March (±0.001, P value of test that doubling rate is less than 4 days approximately 0). If there was only a 1-day lag from onset of infectiousness to presentation with ILI and the entirety of the first week of the U.S. ILI surge is composed of patients with COVID-19, then an epidemic starting 15 January and growing at the rate of deaths in the United States would imply a 12% clinical rate (Fig. 2A). A 4-day lag between the onset of infectiousness and presentation with ILI yields a clinical rate of 25% among the 87% of simulations, which could account for the ILI surge. The 25% overall clinical rate estimated from a 15 January start date and the doubling time of U.S. COVID-19 deaths is in close agreement with the 32% clinical rate we estimated independently based on an 18% asymptomatic rate and 40% symptomatic clinical rate. Although our epidemic model suggests that the first week of the ILI surge is consistent with the U.S. epidemic start date and growth rate, the ILI surge across the United States peaked the week ending 21 March, much earlier than our epidemic models, suggesting that the epidemic in the United States differed from the SEIR model through some combination of factors. Such factors could include successful interventions, even faster decreases in care seeking than observed in New York, heterogeneity in susceptibility (25), or an early epidemic doubling faster than every 3 days.

Faster growth rates require lower clinical rates to explain the ILI surge. Epidemic curves growing at the rate of deaths in Italy, doubling every 2.65 days, could better match the curvature of the ILI surge by peaking around mid to late March but would imply a clinical rate of 4.7% the second week of March with a 4-day lag between onset and recorded as ILI (Fig. 2, B and C). If the entirety of the ILI surge was attributable to COVID-19, the slowest possible doubling time for the U.S. epidemic that can explain the ILI surge would be a doubling time of 4 days. Any evidence of significant secondary introductions, super-spreading, or rapid transmission events in early transmission chains will decrease these estimated clinical rates (26). Evidence of slow initial spread would increase the estimated clinical rates.

Last, estimating the infection fatality rate from the ILI surge requires knowing the clinical rate and the delay from clinical presentation with ILI to death. If patients present with ILI at the onset of their illness, exhibit a 16-day median lag between onset and death (27), and have a 32% clinical rate as estimated from the 18% asymptomatic rate and 40% clinical rate of symptomatic COVID-19 cases, then the observed ILI surge corresponds to an infection fatality rate of 0.29%. We stress that estimating the infection fatality rate from this ILI surge is highly sensitive to both the lag from presentation with ILI to death and the clinical rate (fig. S6). Consequently, the ILI surge is compatible with fatality rates ranging from 0.07 to 1.4% depending on the unknown subclinical rate and lag from presentation with ILI to death. Under the U.S. Centers for Disease Control and Prevention (CDC) planning scenarios specifying a 4-day lag from onset of symptoms to presentation to the doctor with ILI (28) and a 15-day lag from onset to death, the resulting 11-day lag from ILI to death produces IFR estimates of 0.57% (0.51 to 0.68%, 95% credible set) for the unadjusted ILI surge and 0.19% (0.17 to 0.22%, 95% credible set) for the ILI surge adjusted to account for asymptomatic and subclinical cases.

DISCUSSION

We use outpatient ILI surveillance data from around the United States to estimate the prevalence of SARS-CoV-2. We found a clear, anomalous surge in ILI outpatients during the COVID-19 epidemic that correlated with the progression of the epidemic in multiple states across the United States. The surge of non-influenza ILI outpatients was much larger than the number of confirmed case in each state, providing evidence of large numbers of probable symptomatic COVID-19 cases that remained undetected. This result is also consistent with ILI excess observed in France in late February/early March (29). In addition, this finding predicts that the slowest epidemic doubling time that could explain the ILI surge would be 4 days and that this rate could only be achieved with unusually fast early transmission or super-spreading events and a clinical rate near 100%. Consistent with this prediction, we found that deaths due to COVID-19 within the United States doubled every 3.0 days and note that this empirical growth rate for the U.S. epidemic can account for the ILI surge with a 25% clinical rate assuming a 4-day lag from the onset of infectiousness to presentation as an outpatient with ILI. Together, these results suggest that SARS-CoV-2 spread rapidly throughout the United States since its 15 January start date and was likely accompanied by a large undiagnosed population of potential COVID-19 outpatients with presumably milder distribution of clinical symptoms than estimated from prior studies of SARS-CoV-2+ inpatients.

Excess ILI appears to have peaked during the week starting on 15 March, leading the observed ILI dynamics to diverge from the overall epidemic dynamics implied by the growth rate of COVID-19 deaths in the United States. If the ILI dynamics were proportional to the epidemic curve, then the two could be related via a constant subclinical rate. However, the changing ratio between COVID-19 prevalence estimated by the ILI surge and the epidemic curves parameterized by the growth rate of U.S. deaths suggests that additional mechanisms may be behind the ILI slowdown. Mechanisms that can explain the difference between our simulated epidemic curves and the ILI surge include effective social distancing, disproportionate reductions in ILI care-seeking behavior relative to non-ILI care-seeking behavior, or heterogeneity in susceptibility or contact structure not captured in our SEIR model (25).

Our empirical estimate of the size of the ILI surge has several potential limitations. First, the observed ILI surge may represent more than just SARS-CoV-2–infected patients. A second epidemic of a nonseasonal pathogen that presents with ILI could confound our estimates of ILI due to SARS-CoV-2. However, this seems unlikely as additional viral surveillance through the CDC suggests that between 8 and 28 March, other monitored respiratory viruses were at low prevalence (30). Nonetheless, were our approach to be used during winter months, additional steps would be needed to account for concomitant non-influenza seasonal pathogens. In addition, our assumption that changes in health care–seeking behavior are similar between mild ILI and non-ILI condition may be incorrect. Although this assumption was supported by New York City emergency department surveillance data, it is possible that differential health care seeking would be present in other locations or in the outpatient setting. Last, it is also possible that our use of ILI data has underestimated the prevalence of SARS-CoV-2 within the United States. Although early clinical reports focused on cough and fever as the dominant features of COVID-19 (5), other reports have documented digestive symptoms as the complaint affecting up to half of patients with laboratory-confirmed COVID-19 (31), and alternative presentations, including asymptomatic or unnoticeable infections, could result in underestimation of SARS-CoV-2 prevalence.

In addition, our models have several limitations. First, we assumed that ILI prevalence within states can be scaled to case counts at the state level. This is based on the assumption that the average number of cases seen by sentinel providers in a given week is representative of the average number of patients seen by all providers within that state in a given week. Errors in this assumption would cause proportional errors in our estimated case counts and syndromic case detection rate. Second, our U.S.-wide SEIR models vary by growth rate alone and as such may not capture important heterogeneity in susceptibility or transmission and regional variation, intervention-induced changes in transmission, or clustering of infection outbreaks. Our models were used to illustrate that the ILI surge is consistent with an estimated growth rate and start date for the U.S. epidemic and to specify the mutual dependency of growth rate, the lag between the onset of infection and presentation to a doctor, and clinical rates. Finer models with regional demographic and case-severity compartments are needed to translate our range of estimated prevalence, growth rate, and clinical rates into actionable models for public health managers. Last, our method of calculating the infection fatality rate relied on assumptions about the clinical rate and the delay from patients recorded as ILI to death. Our clinical rate required using patterns of care seeking for typical seasonal causes of ILI as did our delay from ILI to death; consequently, neither should be relied on as a definitive source for COVDI-19, and estimating the clinical rate and delay from ILI to death for COVID-19 specifically will reduce the large uncertainty around our ILI-estimated infection fatality rates.

Despite these potential limitations, the ILI surge identified in syndromic surveillance time series allowed early estimates of COVID-19 prevalence, estimates that were not possible from confirmed case data due to early logistical delays in SARS-CoV-2 testing in the United States. Our prevalence estimates are supported by a serosurvey conducted in New York State. We estimated that more than 8.3% of New York State residents were infected by SARS-CoV-2 by 28 March; on 23 April 2020, New York State announced that 14% of residents had evidence of past infection by SARS-CoV-2 by 29 March, at which time the cumulative PCR-confirmed case counts totaled only 0.3% of New York’s population (32).

Although an ILI surge tightly correlated with COVID-19 case counts across the United States and was consistent with the New York State serology study strongly suggesting that SARS-CoV-2 has potentially infected millions in the United States, further laboratory confirmation of our hypotheses is still needed to guide public health decisions. Our findings make testable predictions that one would find relatively high seroprevalence in other states that have already seen an ILI surge and that seroprevalence of individuals infected in March across states is proportional to relative sizes of the states’ ILI surges. A study of ILI patients from mid-March who were never diagnosed with COVID-19 could produce a focused test of our predictions about the number and regional prevalence of undetected COVID-19 cases presenting with ILI during that time. If seroprevalence estimates beyond New York State continue to corroborate our prevalence estimates from syndromic surveillance, this would strongly suggest lower case severity rates for COVID-19 than were assumed in late March by comparing PCR-confirmed case counts to deaths. Further corroboration of our estimates of the magnitude of the ILI surge would suggest that ILI and other public time series of outpatient illness allow early and reliable estimates of crucial epidemiological parameters for rapidly unfolding novel pandemic diseases. Because not all emerging pandemic diseases are expected to present with influenza-like symptoms, surveillance of other illnesses that commonly present in the outpatient setting could provide a vital tool for rapidly understanding and responding to novel infectious diseases.

MATERIALS AND METHODS

Study design

The goal of our study was to use publicly available data to estimate the number of patients seeking care for non-influenza ILI in excess of seasonal trends during the 3 weeks spanning 8 to 28 March 2020, then use this ILI surge to estimate COVID-19 incidence in March, and parameterize epidemiological model growth rates and clinical rates.

The ILI surge detection above produced an excess proportion of patients visiting outpatient providers for non-influenza ILI in each week and each state. To scale up the proportion of patients to a national number of COVID-19 cases, we estimated the number of patients per sentinel provider in the CDC dataset, normalized that number of patients per provider to a number of patients per doctor, and scaled that up by an estimated number of practicing doctors in the United States. The result was an estimated number of COVID-19 patients visiting doctors in each state for each week—we called this our “unadjusted” ILI surge.

The unadjusted ILI surge is an underestimate of COVID-19 prevalence due to only clinical infections, those that seek medical care. We accounted for both asymptomatic infections and symptomatic but subclinical infections to produce an “adjusted” ILI surge as our final estimate of COVID-19 incidence in each state and each week. We then used the unadjusted and adjusted ILI surges to estimate syndromic case detection and fatality rates. We also used the unadjusted ILI surge as an empirical observation to evaluate the epidemiological modeling of COVID-19 growth rates and clinical rates in the United States. Throughout our methods, we use i to denote the index state i and we use t to denote the index week t (with t = 0 referring to 3 October 2010; the start of state-specific ILINet surveillance).

Data sources

Since 2010, the CDC has maintained ILINet for weekly influenza surveillance. Each week, approximately 2600 enrolled providers distributed throughout all 50 states, as well as Puerto Rico, the District of Columbia, and the U.S. Virgin Islands, report the total number of patient encounters nit and the total number of which met criteria for ILI [defined as a temperature of 100°F (37.8°C) or greater and a cough or sore throat without a known cause other than influenza; yit] (33). For scale, in the 2018–2019 season, ILINet reported approximately 60 million outpatient visits. Coupled to these data are weekly state-level reports from clinical and public health laboratories detailing the number of patient samples tested for influenza nitflu as well as the number of these samples that are positive for influenza yitflu. Therefore, ILINet data can be thought of as a weekly state-level time series representing the superimposed prevalence of various viruses that can cause ILI. ILINet data were obtained through the CDC FluView Interactive portal (34, 35).

In addition to ILINet data, we downloaded U.S. State population data for the year 2020 from https://worldpopulationreview.com/states/. The number of primary care providers in each state per 100,000 residents bi was obtained from the United Health Foundation (20, 24). COVID-19 confirmed case counts were obtained from The New York Times’ database maintained at https://github.com/nytimes/covid-19-data. This dataset contains the daily cumulative confirmed case count for COVID-19 for each state zil for day l. The dataset of deaths in Italy was downloaded from https://github.com/pcm-dpc/COVID-19 on 6 April 2020.

Data processing

Within the ILINet dataset, New York City and New York were summed into a combined New York variable representing both New York City and the surrounding state. Because of incomplete data in one or more of the data sources described above the Virgin Islands, Puerto Rico, The Commonwealth of the Northern Mariana Islands, and Florida were excluded from subsequent analysis. In addition, to match the weekly reporting of ILI from ILINet, daily cumulative confirmed COVID-19 cases were converted to weekly counts of new cases byzit=ltzilzi(l1)

Extracting non-influenza ILI signal

To subtract influenza signal from yit, we assumed that the population of patients with ILI within a state are the same population that are potentially tested for influenza. This assumption allows us to calculate the number of non-influenza ILI cases asyit=(1yitflunitflu)yit

Mean imputation based on neighboring states was used to address missing values in laboratory influenza quantification. To assess the impact of this model for extracting non-influenza ILI signal, we calculated COVID-19 prevalence without first removing signal from influenza and found little change in our prevalence estimates (fig. S7). This likely reflects that influenza also demonstrates strong seasonal patterns that can be addressed as discussed below.

Identifying ILI surges

We identified ILI surges inyit by training a model on yit for all data before 21 July 2019. We then used this model to predict the prevalence of non-influenza ILI (π̂it) for dates after and including 21 July 2019. We calculated the ILI surge as the difference between the observed proportion of non-influenza ILI yit/nit and π̂it.

To account for variation in the number of total patients, we modeled yit as binomial distributed. To account for correlation in non-influenza ILI over time, we used a Gaussian process model, which assumes that weeks that are closer together will have more similar levels of non-influenza ILI. The following model reflects these modeling choicesy˜itBinomial(πit,nit)πit=exp(ηit)1+exp(ηit)ηitN(λi(t),σ2)λi(t)GP(θ(t),σ2Γ)σ2InverseGamma(ν,ξ)θ(t)=θΓ(t,t+s)=αexp(s22ρ2.)

where GP refers to a Gaussian process. We made the following prior specifications: We set the bandwidth parameter for the squared exponential kernel as ρ = 3 representing a strong local correlation in time that died off sharply beyond 3 weeks, α = 1 representing a signal-to-noise ratio of approximately 1, and ν = 1 and ξ = 1 representing weak prior knowledge regarding the overall scale of variation in the latent space. Last, we set θ = −2.197 representing an off-season prevalence of 0.1% non-influenza ILI. Samples from the posterior predictive density p(πityi1,,yiT,ni1,,niT) were collected using the function basset from the R package stray (36); a total of 4000 such samples were collected, for each state, in this analysis. We defined the prevalence of non-influenza ILI in excess of normal seasonal variation as yit*=yit/nitπ̂it.

To investigate whether our results were sensitive to the above model specification, we alternatively used the sample mean and variance from years 2010–2018 as an estimate of typical seasonal non-influenza ILI. Despite not accounting for the binomial count structure of ILI data or correlations in the proportion ILI between weeks, this simpler model resulted in nearly identical prevalence estimates (fig. S8). Still, we used the GP-derived estimates throughout this paper because of their better accounting for the known binomial count and week-to-week correlation structure of ILI-causing pathogen prevalence.

To exclude variation attributable to unseasonably high rates of other ILI causing viruses (such as the outbreak of RSV in Washington State in November–December 2019), we only investigated yit* for weeks after 7 March 2020, because only these later weeks had high correlation to the COVID-19 confirmed case rate (fig. S2).

Calculating scaling factors to relate ILINet data to COVID-19 cases

Because new COVID-19 case counts zit represent the number of confirmed cases in an entire state and ILINet data represents the number of cases seen by a select number of enrolled providers, we had to estimate scaling factors wi to enable comparison of ILINet data to confirmed case counts at the state level. Let πit* denote the probability that a patient with ILI in state i has COVID-19 as estimated from ILINet data. Let pi denote the population of state i, and let bi denote the number of primary care providers per 100,000 people in state i. We translated the inferred proportion of individuals with ILI due to COVID-19 to the state level by considering the average number of patients seen across all providers in the state in a 5-day workweek. In addition, we added a discount factor λ = 0.55 to calibrate these estimates with prior reports regarding the total number of outpatient visits per year (18). This yielded our estimated number of COVID-19 cases (excess ILI at the state level) aswi=5bipi105mλyit=wiπitwhere m = 20.2 is the mean number of patients seen by physicians per day (19).

Accounting for subclinical infections

To account for the contribution of subclinical SARS-CoV-2 infections, we used a recent analysis of cohort surveillance from the Diamond Princess (37). Monte Carlo simulations were used to propagate error from our uncertainty regarding potential asymptomatic infections affecting the clinical rate δb into our calculation of posteriors for epidemic trajectories. To match posterior estimates, we used quantile matching to parameterize δc~Beta(α, β) to achieve a mean of 0.179 and a 95% probability set of (0.155, 0.202). In addition, we took δc = 0.4 based on a large study of adult health care–seeking behavior in the United States (23). To account for these subclinical contributions, we used adjusted scaling factorswi*=wi(1δc)(1δb)

Estimating syndromic case detection rates

Assuming that the majority of SARS-CoV-2 testing within the United States has been directed by patient symptoms (38), the pool of newly diagnosed SARS-CoV-2+ patients is a subset of the pool of SARS-CoV-2+ patients who are identified as having ILI. Therefore, we calculated the probability that a SARS-CoV-2+ patient with ILI who seeks medical care will be identified as having SARS-CoV-2 as δs=zij/yit (fig. S5).

Estimating infection fatality rates

The exact lag from an outpatient being recorded as ILI to death is unknown, but estimated lag times from onset to death and from hospitalization to death (27) can be used to understand the range of implied infection fatality rates from the ILI surge. We calculated the infection fatality rate implied by the ILI surge as a function of the unknown lag from patients being recorded as ILI and death, and we repeat this calculation for both the raw and subclinical rate–adjusted ILI estimates. For a lag of l days from ILI reporting to death, the infection fatality rate was estimated by dividing the magnitude of the adjusted or raw ILI surge by all new deaths occurring within the dates (8 March 2020 + l, ..., 28 March 2020 + l). A plot of the fatality rate by lag for raw and unadjusted ILI surges revealed a large range of fatality rates compatible with the ILI surge and highly sensitive to the estimate of lag and clinical rates. One study (27) estimated a median of 11.2 days from hospitalization to death and 16.1 days from symptom onset to death. For the raw ILI surge estimate, 11- and 16-day lag times would produce median infection fatality rate estimates of 0.57 and 0.89%, respectively, without adjusting for any subclinical infections; for the subclinical-adjusted ILI surge estimate, these lag times would produce median infection fatality rate estimates of 0.19 and 0.29%, respectively.

Growth rate estimation

As of 6 April 2020, deaths from the SARS-CoV-2 epidemic were still growing nearly exponentially as evidenced by a nearly linear growth on a log y axis. Early in the epidemic, estimating exponential growth rates by Poisson regression with a log link function produces accurate estimates of the true growth rate (33), and so we estimated growth rates for the United States and Italy by Poisson generalized linear models predicting new deaths using date as a quantitative explanatory variable. U.S. COVID-19 deaths from 5 March to 1 April 2020 were summed by date to calculate national-level statistics. Initially, 2 to 5 April were included but were found to have anomalously high leverage and were hence excluded from our analysis. We applied the same procedure to COVID-19 deaths in Italy, focusing on deaths from 24 February to 12 March. We used the slope from Poisson regression as the estimated exponential growth rate, which yielded a U.S. growth rate of rUS = 0.23 or a 3.01-day doubling time of the infected population over time and an Italian growth rate of rIT = 0.26 or a 2.65-day doubling time of the infected population over time.

Epidemic simulations and clinical rates

The following SEIR modelsṠ=ζβSIωbSĖ=βSIγEωbEİ=γEνIωiIṘ=νIωbRwere parameterized for the United States to a time scale of units days by setting ζ = 3.23 × 10−5 corresponding to a crude birth rate of 11.8 per 1000 per year, a baseline mortality rate ωb = 2.38 × 10−7 corresponding to 8.685 per 1000 per year, and an infectious mortality rate ωi = 4.96 × 10−4 corresponding to an infection fatality rate of 0.5% required to fit U.S. deaths under a 20-day lag from onset to death. Furthermore, we drew a random incubation period γ−1~LogNormal(1.087,0.153) reflecting empirical estimates of a median 5 days from exposure to symptom onset with 4.2- to 6-day 95% credible interval (35, 39), which is then offset by 2 days of presymptomatic transmission as documented across carefully studied clusters in Singapore (40), resulting in a 2.2- to 4-day 95% credible interval for a log-normally distributed incubation period. Similarly, in each simulation, we also drew a random infectious period ν−1~LogNormal(2.193,0.105) based on 2 days of presymptomatic infectiousness and high viral loads in nasopharyngeal samples (41, 42) combined with persistence of high loads of SARS-CoV-2 that can be cultured up to 7 days after symptom onset (43), resulting in our use of a 7.3- to 11-day 95% credible interval for the infectious period. Last, we parameterized β to ensure I(t) grew with a specified exponential growth rate early in the epidemic. We ran a total of 2000 simulations for each of the two growth rate distributions (United States and Italy) analyzed. Growth rates were drawn at random from a normal distribution with an SD of 0.1 and centered on rUS and rIT, respectively. To illustrate the mutual dependence between estimates of growth rate, clinical rate, and the lag between the onset of infectiousness to presentation to a doctor with ILI, we ran 2000 simulations with uniform growth rates in the interval [0.173,0.365] corresponding to a range of doubling times between 1.9 and 4 days.

Each simulation was initialized with (S, E, I, R, t) = (3.27 × 108,0,1,0,0), where time 0 was 15 January and simulations were run until 5 August 2020. The SEIR model was simulated with a Gillespie algorithm through the R package adaptivetau (44) on the assumption that a large amount of variation in the epidemic trajectory stems from uncertainty in trajectory of early transmission chains. The number of infected individuals on a given day was the last observed I(t) for that day, and a weekly pool of infected patients was computed by a moving sum over the number of infected individuals every day for the past week, Iw(t)=k=06Itk.

Defining Yt=iyit as the national excess ILI, the clinical rate implied by a given simulation was estimated asδc(td)=YtIw(ttd)for a given time delay td it takes from the onset of infectiousness to a patient reporting to the doctor with ILI.

SUPPLEMENTARY MATERIALS

stm.sciencemag.org/cgi/content/full/12/554/eabc1126/DC1

Fig. S1. Excess ILI for each U.S. state.

Fig. S2. Excess ILI correlates strongly with patterns of newly confirmed COVID-19 cases.

Fig. S3. Surveillance data from New York City emergency departments.

Fig. S4. Prevalence of SARS-CoV-2 infections between 8 and 28 March 2020.

Fig. S5. Syndromic case detection rates by state.

Fig. S6. Estimating the infection fatality rate (IFR) of COVID-19 based on the unadjusted ILI surge.

Fig. S7. Investigating model sensitivity when ILI is modeled without first removing signal from influenza.

Fig. S8. Investigating model sensitivity when seasonal trends in non-influenza ILI are identified using an alternative statistical model.

https://creativecommons.org/licenses/by/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: We thank R. Silverman, A. Gonzalez-Aller, R. Plowright, C. Parrish, J. Dushoff, B. Bolker, J. Lourenco, S. Gupta, D. Rosenheck, and S. Stephens-Davidowitz for encouragement and insightful manuscript comments. We thank R. Mathes of NYC DOHMH for help in obtaining NYC syndromic surveillance data. Funding: J.D.S. was supported in part by the Duke University Medical Scientist Training Program (GM007171). Author contributions: J.D.S. and A.D.W. conceptualized and wrote the manuscript and performed data analysis. N.H. provided data, independent analyses of NYC data, conceptual guidance, and editing support. Competing interests: A.D.W. owns Selva Analytics LLC. The other authors declare that they have no competing interests. Data and materials availability: All data associated with this study are present in the paper or the Supplementary Materials. All code and data required to reproduce our results are publicly available at 10.5281/zenodo.3898636. This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using this material.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article