Undiagnosed SARS-CoV-2 seropositivity during the first 6 months of the COVID-19 pandemic in the United States

See allHide authors and affiliations

Science Translational Medicine  07 Jul 2021:
Vol. 13, Issue 601, eabh3826
DOI: 10.1126/scitranslmed.abh3826

Elucidating seroprevalence in COVID-19

Symptoms of SARS-CoV-2 infection range from completely asymptomatic, to those of a common cold, to a drop in oxygen saturation and lung function, and death in some patients. To evaluate the proportion of the U.S. population who had an undiagnosed infection during the first wave of the COVID-19 pandemic, we measured antibody prevalence in study participants who had not previously been diagnosed with a SARS-CoV-2 infection. By mid-July of 2020, 16.8 million people had an undiagnosed SARS-CoV-2 infection, almost five times the rate of diagnosed infections.


Asymptomatic SARS-CoV-2 infection and delayed implementation of diagnostics have led to poorly defined viral prevalence rates in the United States and elsewhere. To address this, we analyzed seropositivity in 9089 adults in the United States who had not been diagnosed previously with COVID-19. Individuals with characteristics that reflected the U.S. population (n = 27,716) were selected by quota sampling from 462,949 volunteers. Enrolled participants (n = 11,382) provided medical, geographic, demographic, and socioeconomic information and dried blood samples. Survey questions coincident with the Behavioral Risk Factor Surveillance System survey, a large probability-based national survey, were used to adjust for selection bias. Most blood samples (88.7%) were collected between 10 May and 31 July 2020 and were processed using ELISA to measure seropositivity (IgG and IgM antibodies against SARS-CoV-2 spike protein and the spike protein receptor binding domain). The overall weighted undiagnosed seropositivity estimate was 4.6% (95% CI, 2.6 to 6.5%), with race, age, sex, ethnicity, and urban/rural subgroup estimates ranging from 1.1% to 14.2%. The highest seropositivity estimates were in African American participants; younger, female, and Hispanic participants; and residents of urban centers. These data indicate that there were 4.8 undiagnosed SARS-CoV-2 infections for every diagnosed case of COVID-19, and an estimated 16.8 million infections were undiagnosed by mid-July 2020 in the United States.


Coronavirus disease 2019 (COVID-19), the disease caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) infection, presents with a spectrum of illness ranging from asymptomatic to severe disease. As with most respiratory viral diseases, it is difficult to estimate the true prevalence of the disease during a pandemic and the extent of its spread is only known after extensive study (13). Most patients infected with SARS-CoV-2 develop robust antibody responses against the viral spike protein, nucleocapsid protein, and the envelope protein that can be detected by serological testing (48). Antibodies against spike protein persist for months and can neutralize SARS-CoV-2 (9). Frequently, these neutralizing antibodies bind to the receptor binding domain (RBD) of the spike protein, but antibodies against the spike protein S2 domain have also been observed (1015).

To characterize the spread of SARS-CoV-2 infection in the United States, we evaluated seropositivity in a national survey of participants who had not previously been diagnosed with SARS-CoV-2 infection. We used quota sampling from a large pool of volunteers (n = 462,949) to obtain a representative sample (n = 9089) and performed statistical weighting to generate prevalence estimates that revealed the extent of SARS-CoV-2 infection in the general population. To ensure accurate classification of seropositivity, we used our dual-antigen enzyme-linked immunosorbent assay (ELISA) protocol that evaluated immunoglobulin G (IgG) and IgM antibodies against both the full viral spike protein ectodomain and the RBD (8, 16).


Enrollment and demographic representation

Recruitment took place from 1 April 2020 to 4 August 2020. During that time, 11,283 participants were enrolled from a pool of 241,424 volunteers in the United States (50 states and the District of Columbia). Of these participants, 214 had blood collected via venipuncture and 11,069 were sent volumetric dried blood microsamplers (absorbent polymer, 20-μl collection volume). More than 80% of the microsamplers were returned (9089 participants). Ultimately, 9028 participant blood samples were analyzed using ELISA for the presence of anti–SARS-CoV-2 spike protein antibodies. Of those, 8058 participants had a complete clinical questionnaire and were included in the weighted analysis (Fig. 1). Most blood sample collection (>88%) occurred within the 11-week period between 10 May and 31 July 2020 (figs. S1 and S2). The six major demographic factors used in participant selection are summarized in Table 1. Participant sampling was representative of the U.S. population. When expanded to include the additional 10 demographic or health-related factors captured by the Behavioral Risk Factor Surveillance System (BRFSS), many factors were well matched, but there were some differences, for example, our sample population was more highly educated, had higher employment rates, and had better access to health care compared to the general U.S. population (Table 1).

Fig. 1 SARS-CoV-2 serosurvey study overview and statistical workflow.

A flow chart of participant recruitment through data analysis displays steps in data acquisition and lists participant attrition. Ovals show the start and end of data analysis or data acquisition; gray rectangles indicate subsets of participants in this study; blue parallelograms represent individuals from outside data sets that contributed to adjusted prevalence estimates; blue rounded rectangles present analysis processes.

Table 1 Characteristics of the serosurvey population compared to the U.S. population.

Census and BRFSS (2018) data on selection criteria were used for quota-based sampling in our SARS-CoV-2 serosurvey. Other values from BRFSS were used for statistical weighting. The table shows comparisons between the estimated proportion of the U.S. population in each category according to weighted BRFSS data compared to our sample population in the SARS-CoV-2 serosurvey. NLF, not in the labor force (student, retired, unable to work, refused to answer, not asked/missing).

View this table:

Estimates of seroprevalence

There were 304 seropositive participants in the analysis set (Fig. 2). This gave a weighted estimate of 4.6% of the undiagnosed adults in the U.S. population who were seropositive for SARS-CoV-2 infection [95% confidence interval (CI), 2.6% to 6.5%, n = 8058 complete testing and survey]. Using this average rate over the study period, we estimated that there were 4.8 undiagnosed SARS-CoV-2 infections for each diagnosed case over the course of the study (95% CI, 2.8 to 6.8). Among seropositive participants, 36.51% were IgG+IgM+IgA+, 28.29% were IgG+IgMIgA+, 17.11% were IgG+IgMIgA, 13.16% were IgG+IgM+IgA, 4.28% were IgGIgM+IgA, and 0.66% were IgGIgM+IgA+ (Fig. 2, A to D, and fig. S3). There were variations in antibody profiles across different demographic groups, specifically anti-spike protein and anti-RBD IgG antibodies (figs. S4 and S5).

Fig. 2 Geographic distribution of undiagnosed seropositivity in the United States from May to July 2020.

Raw serology data for (A) IgG, (B) IgM, and (C) IgA against SARS-CoV-2 spike protein and the receptor binding domain (RBD) of spike protein are displayed. Cut points for positivity are shown as red dashed lines; data are optical density (OD). (D) Serologic phenotype of antibody presence in 304 seropositive participants. (E) The map of the United States displays seropositivity in the six regions surveyed: Northeast: ME, NH, VT, MA, NY, CT, RI, PA, and NJ, 7.5% (95% CI, 3.9% to 12.4%); Midwest: MN, IA, WI, IL, IN, MI, and OH, 1.6% (95% CI, 0.3% to 2.4%); Mid-Atlantic: MD, DE, DC, VA, WV, KY, TN, NC, SC, and GA, 8.6% (95% CI, 2.6% to 18.9%); South/Central: FL, MS, AL, LA, AR, MO, KS, and OK, 3.0% (95% CI, 1.2% to 5.0%); Mountain/Southwest: TX, NM, AZ, CO, UT, WY, NE, SD, ND, MT, and ID, 4.5% (95% CI, 1.3% to 9.5%); West/Pacific: WA, OR, NV, CA, AK, and HI, 1.9% (95% CI, 0.2% to 3.8%). Each person in (E) represents 100 participants; orange represents weighted prevalence estimate within the geographic region.

We found regional variations in seroprevalence estimates across the United States (Figs. 2E and 3). The Northeast and Mid-Atlantic regions showed the highest rates of seropositivity, whereas the lowest seropositivity was in the Midwest. Urban areas were estimated to have higher point estimates of seropositivity (5.3%) compared to rural areas (1.1%) at the time blood samples were collected. Estimates of seroprevalence were calculated for other demographic subgroups (Fig. 3). The youngest age group, 18 to 44 years, had the highest estimated seropositivity (5.9%). Estimated seroprevalence for females was 5.5% and was 3.5% for males. The seroprevalence estimate for African Americans was highest at 14.2% followed by participants who self-identified as other/unlisted race (11.1%), American Indian/Alaska Native (6.8%), followed by White/Caucasian (3.1%), whereas those identifying as Asian displayed the lowest seroprevalence estimate (2.0%).

Fig. 3 Undiagnosed SARS-CoV-2 seroprevalence in the main demographic categories.

Six main categories were used during quota-based sampling: region, age, sex, race, ethnicity, and urban/rural. Seropositivity estimates of blood samples that had a full clinical questionnaire completed and successful sampling are shown. Data are weighted estimates ± 95% CIs. Black dashed vertical line, weighted national seroprevalence estimate; *, n value too low to make a proper weighted estimate so raw positivity is displayed.

Participants who reported a known exposure to a SARS-CoV-2–infected individual had a higher seroprevalence estimate (15.6%) compared to those who did not (2.7%). In comparison to the national average (4.6%), those who worked from home had a lower seropositivity estimate of 3.0%. Those who reported previous vaccination (for influenza 3.2% or pneumonia 2.3%) had a lower likelihood of undiagnosed seropositivity. Those who had health conditions associated with poor outcomes in SARS-CoV-2 infection, including coronary heart disease, asthma, and diabetes, displayed lower rates of seropositivity (Fig. 4). Other health conditions were also correlated with a decreased seropositivity rate such as skin cancer, stroke, or arthritis.

Fig. 4 Seroprevalence estimates according to socioeconomic and health characteristics.

(A to C) Evaluation of the effect of nondemographic traits on seroprevalence estimates for blood samples that had a full clinical questionnaire completed and successful sampling. Nondemographic traits included (B) socioeconomic and (C) health characteristics. Data are weighted estimates ± 95% CIs. Gray dashed vertical line, weighted national seroprevalence estimate.

Our results estimate that as of July 2020, there were about 4.79 undiagnosed infections (95% CI, 2.76 to 6.82; fig. S6) for every identified case of COVID-19, suggesting a potential 16.8 million undiagnosed infections by July 2020 in addition to the reported 3.5 million diagnosed cases in the United States. These data suggest that a higher level of infection-induced immunity exists in the U.S. population than previously predicted.


These results, including the subgroup analysis, provide us a previously undescribed view into the spread of the COVID-19 pandemic by more clearly identifying the large numbers of individuals with undiagnosed infections during the initial months of the pandemic. These data are of great importance as we consider the impact vaccination may have on the future course of the pandemic and plan for current and future available vaccines to be administered. In addition, these data can also help us to better assess the public health measures taken during the pandemic and how to take the best approaches forward during any future public health emergencies.

This study demonstrates that spread of the SARS-CoV-2 virus in the United States during the first 6 months of the pandemic was more widespread than has been suggested by data reporting diagnostic test-confirmed cases. Similar to responses to other respiratory viruses, such as influenza, many individuals develop asymptomatic or mild disease that is not medically attended and therefore never diagnosed. Our findings indicate that there are nearly five individuals with a previous asymptomatic infection for every diagnosed case of COVID-19. Furthermore, patterns of our seroprevalence data match well with those of diagnosed cases reported during a similar time frame (17). For example, the greater seropositivity estimated in densely populated urban areas follows the observed initial spread of SARS-CoV-2. In comparison to the national average, we found that the Midwest, South, and West had lower seroprevalence rates during the study time frame, which preceded a substantial increase in SARS-CoV-2 infections in these regions detected by viral testing.

Our data suggest that the youngest age group had the highest undiagnosed seroprevalence, which is consistent with observations that they display less severe symptoms than older patients (18). We also found higher undiagnosed seroprevalence in females, possibly suggesting a higher risk for asymptomatic disease. Participants with chronic diseases that are more likely to be associated with severe clinical manifestations of COVID-19, including diabetes, heart disease, and asthma, had a lower prevalence of asymptomatic SARS-CoV-2 infection in comparison to the national average. Those with known exposure to SARS-CoV-2–infected individuals had a higher estimated incidence of undiagnosed seropositivity. We also found that African American and Hispanic participants had higher undiagnosed seropositivity, correlating with national data on disease burden in these subgroups.

Our study reports a representative population sample across the United States and evaluated regional, demographic, and socioeconomic differences in the prevalence of asymptomatic SARS-CoV-2 infection. In contrast, other reports of seroprevalence data focus on specific groups of individuals or geographic locations, such as dialysis patients or individuals who reported for blood draws that may be biased toward those needing medical care during the pandemic (1936). These previous studies came within the range of our estimate of undiagnosed cases when considering the additional diagnosed cases within the same time frame. Our results provide new insight into the spread of SARS-CoV-2, estimating the national undiagnosed exposure rate to illuminate the scope of infection during the first 6 months of the pandemic. As expected, given delayed arrival in different geographic areas such as the Midwest and rural South, undiagnosed infection estimates varied by region, with the Mid-Atlantic region having the largest proportion of undiagnosed infections in comparison to diagnosed cases. Given the high point estimate of undiagnosed seropositivity in younger participants, lower point estimates in individuals with preexisting conditions such as diabetes, and the vaccine rollout starting with older persons and those at risk, we could see a faster onset of herd immunity due to these undiagnosed infections in populations that are in lower priority groups for vaccination. Young and healthy individuals, such as those under the age of 16 who were not eligible for the first wave of vaccines in the United States and those under 12 who are still ineligible, could serve as an asymptomatic reservoir for viral mutations leading to increased transmissibility or vaccine escape mutations, which has been shown in unvaccinated children and adults with viral persistence (37). Further long-term studies of immunity in the population will be necessary to understand durability of the immune response to the vaccine versus infection, how infection-induced immunity affects vaccine response and performance, and whether herd immunity can play a role in controlling the spread of SARS-CoV-2. In addition, further subgroup analysis of these data will be useful in clarifying the spread of disease in the presence of public health measures and how we may be able to refine and further target those measures in the future.

Our study has several limitations. First, although extensive statistical adjustments were made, our study cohort is based on a nonrandom volunteer sample, which can have selection bias. Traditional random sampling studies using probability sampling design may have low response rates, calling into question the advantages of that practice (38, 39). Our study population also exhibited some differences from the general U.S. population, such as higher education level and access to health care that had to be adjusted for with statistical weighting. Larger sample sizes would allow us to make more detailed estimates, although potentially at the cost of how representative the population is. We used both census and behavioral data to weight our results, although it is possible that there are variables associated with disease transmission that were not accounted for in our weighting. Although we used extensive validation methods on our ELISA (8) for seropositivity designations, we used historical serum samples and convalescent post-infection samples because dried blood was unavailable from historical samples on the collection devices. Future cross-verification with an independent analyte, such as the nucleocapsid protein, could prove useful, although antibodies to nucleocapsid fade and would require correction for antibody decay.

Our data suggest a larger spread of the COVID-19 pandemic in the United States during the first 6 months than originally thought. Our findings have implications for understanding SARS-CoV-2 spread, epidemiological characteristics of spread, and prevalence in different communities and could have a potential impact on decisions involved in vaccine rollout. Continued large-scale surveillance of SARS-CoV-2 immunity is in progress, discriminating infection-based and vaccine-induced antibody responses. Mathematical models are being generated to understand the pandemic, vaccine performance, and public health measure efficacy and to provide insight into the best approach for handling the next virus with pandemic potential.


Study design

This study was designed to determine the seroprevalence of anti–SARS-CoV-2 antibodies in adults 18 years of age or older in the United States who had not been previously diagnosed with COVID-19. The primary endpoint was the weighted estimate of seroprevalence in the United States. Secondary endpoints were weighted estimates for subgroups categorized by demographics or risk factors. An initial period enrolled a convenience sample of 593 volunteers before the quota sample. Participants across the United States (50 states and District of Columbia) were then enrolled through telephone consent from a pool of volunteers who provided basic demographic data in response to the study announcement. Recruitment calls were made from three sites: National Institute of Allergy and Infectious Diseases (NIAID) Laboratory of Infectious Diseases Clinical Studies Unit, the University of Pittsburgh Clinical and Translational Science Institute (CTSI), and the University of Alabama at Birmingham Center for Clinical and Translational Science (CCTS). The selection of participants is described below. Selected participants were contacted by the study team, consented, and sent a blood microsampling kit and questionnaire in the online REDCap platform ( For a small subset of participants (n = 214) working on the National Institutes of Health (NIH) campus, serum was collected by venipuncture.

This serosurvey clinical study ( NCT04334954) is ongoing and will follow the same cohort of participants over time to evaluate seroprevalence and antibody profiles in comparison to the demographic, health, and socioeconomic data provided by each participant. This study was approved by the NIH Institutional Review Board and conducted in accordance with the provisions of the Declaration of Helsinki and Good Clinical Practice guidelines. All participants provided verbal informed consent before enrollment.

Participant selection

The study was advertised online through an official NIH Press Release that linked to an email address to volunteer for selection in the study ( This press release was subsequently publicized by local and national news outlets and covered via broadcast television news, print news, and internet news articles. All volunteers were emailed an initial survey to collect basic demographic characteristics. Survey responses were de-identified and aggregated by subcategory of state, type of locality approximated from zip codes, age, sex, race, and ethnicity (Fig. 1). Target sample sizes for these subcategories were determined from the U.S. census and were updated every evening based on the characteristics of people who had already enrolled to assure that individuals in each subcategory were enrolled evenly over time. Within each subcategory, participants were initially assigned a selection probability calculated from the target number as a proportion of the available pool. Specific subcategories that had insufficient numbers were aggregated to estimate their impact on the overall distribution of the six main characteristics. If a particular characteristic had insufficient numbers, sample probabilities were boosted for volunteers who had the characteristic. For each day’s call list, the most representative of 20,000 randomly generated lists was used, each list drawn without replacement from the volunteer pool based on the sampling probabilities previously defined. Representativeness was assessed by estimating a weighted sum of squared differences from the desired targets and picking the list with the lowest deviation. Unselected participants were eligible to be called at a later date. This algorithm is designed such that each cohort of invited participants is representative of the diversity of the U.S. population with respect to the six sampling variables (see section S4).

Blood sample collection

Participants provided blood samples by mail using a Mitra microsampling kit (Neoteryx, Torrance, CA) or standard venipuncture. Microsampling kits contained visual instructions on the sampling process, bandages, gauze, lancets, and four 20-μl microsampling devices for a total collection of 80 μl of whole blood. Participants used the lancet to draw blood from their fingertip and collect blood onto each of the four microsamplers. Participants returned the dried microsamplers with desiccant via overnight shipping. Those who underwent venipuncture did so in the NIH Clinical Center phlebotomy laboratory, where 18 ml of blood was collected in a serum separator and whole blood tube. Once received in the laboratory, serum samples were processed, and microsamplers were stored dry at −80°C until elution and analysis.

Serologic assays

Antibodies from samples were analyzed using ELISA as previously described (8, 4042). To maintain longitudinal quality control and ensure that the assays remained stable across multiple months of assay implementation, positive and negative controls were included on each assay plate and monitored for stability (fig. S7). Seropositivity cut points were defined by evaluating 300 true-negative samples and 56 true-positive samples. Positivity thresholds were based on the mean optical density (absorbance) plus 3 SDs (see the Supplementary Materials for details). The final criterion of a Spike+ and RBD+ for any combination of IgG or IgM gave estimated sensitivity and specificity of 1, with raw values for recombinant antibody results reported in fig. S8 and table S1. In addition, IgA was evaluated via previously described ELISA to further phenotype the participant’s serologic status. Raw sample positivity data by state can be found in fig. S9.

Statistical analysis

The iterative quota sampling (described in the “Participant selection” section) that we used continuously matched the proportion of people in the study with the census estimated proportion of people in the United States on six variables (Table 1 and Fig. 1). This ensured that each periodic sample of participants over the course of the study was representative, and the time effects of the pandemic were approximately independent of those six variables (fig. S2). Each participant was asked demographic and health-related questions that matched those on the BRFSS survey, a large probability-based national survey (43). Responses to those matching questions were used with BRFSS survey data to adjust estimators to account for important criteria that may be related to both selection probability and seropositivity but were not accounted for in our quota sampling. Those adjusted estimators used weighting based on the propensity of being a quota sample versus a BRFSS sample participant and poststratification to U.S. census data. Weighting additionally accounted for sensitivity and specificity. CIs were calculated for the final seroprevalence estimates accounting for both the variability of the weighting and of the sensitivity and specificity adjustment. The ratio of undiagnosed SARS-CoV-2 infections to diagnosed cases of COVID-19 was estimated as the final seroprevalence estimate times a factor calculated from the daily national population and diagnosed cases. Detailed statistical methods are provided in the Supplementary Materials. The main computer code used in this study is available at: Sources used for analysis can be found in (8, 38, 39, 4356).


Statistical Methods

Figs. S1 to S9

Table S1

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Acknowledgments: We thank all of the participants in this study. We acknowledge K. Corbett and B. Graham of the NIAID VRC for their generous donation of coronavirus spike expression plasmid, and A. Schmidt, J. Feldman, B. M. Hauser, and T. M. Caradonna of the Ragon Institute of MGH, MIT, and Harvard for their donation of their RBD expression plasmid. We thank J. Crissey and the CRIMSON team (NIAID, NIH) for their assistance and data management. We thank J. McLellan for scholarly discussions regarding the spike S-2P construct. We thank the CCTS Clinical Research Support Program (A. Delbridge and L. Dukes), UAB School of Public Health and its Survey Research Unit (L. Battle, J. Carson, M. DeRamus, T. Fields, T. Graham, T. Jackson, E. Pruitt, A. Underwood, and P. Wolff), the University of Pittsburgh CTSI staff and leadership (J. Avolio, L. Bash, S. Clayton, M. Cristinziano, K. Daw, C. Fascetti, E. Gyurisin, J. Huwe, N. Jones, D. Mathias, S. Mathias, A. Mykita, B. Petersen, M. Phillips, C. Rush, E. Shepherd, S. Shetty, A. Socci, L. Stearns, S. Ugbomah, K. Underwood, and L. Yasko), and University of Pittsburgh Information Technology (D. McGaughey, S. Ritzman, and T. Smith) for their contributions to this study. We thank D. Bernard for his assistance with sample delivery, and S. Ford-Scheimer for organizational assistance. Funding: This research was supported, in part, by the Intramural Research Program of the NIH, including the National Institute for Biomedical Imaging and Bioengineering, the National Institute of Allergy and Infectious Disease, and the National Center for Advancing Translational Sciences. This project has been funded, in part, with Federal funds from the National Cancer Institute, NIH, under contract number HHSN261200800001E, 75N91019D00024, Task Order No. 75N91019F00130, Clinical and Translational Science Awards Program grants UL1TR003096 (UAB) and UL1TR001857 (University of Pittsburgh). The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. government. The NIH, its officers, and employees do not recommend or endorse any company, product, or service. Author contributions: H.K., C.K.-T., J.H., J.T., K.P., J.S., T.N., K.M.A., M.K., A.S., R.L., and K.S. processed blood samples and performed serology experiments. J.M., M.D., K.R.S., P.F., V.W., J.-P.D., S.M., W.G., and D.E. synthesized antigen constructs and produced protein for serology. H.K., C.K.-T., and K.S. performed seropositivity and quality control analysis on serology data. H.A.B., J.A.C., L.C., O.B., C.C., A.H., L.T.G., L.A.R., R.B., R.A., A.C.-M., M.G., B.H., S.V., R.C., M.K.M., A.K., R.S., S.S., S.R., E.W.F., S.G.M., R.P.K., S.E.R., and M.J.M. recruited and enrolled participants. S.H., M.P.F., N.S., J.W., Y.L., and B.I.G. performed statistical weighting and survey analyses. S.H., M.P.F., R.P.K., S.E.R., M.D.H., M.J.M., and K.S. interpreted seroprevalence results. M.D.H., D.E., M.J.M., S.H., and K.S. designed the study. H.K., C.K.-T., S.H., R.P.K., S.E.R., M.D.H., D.E., M.J.M., and K.S. wrote and edited the manuscript. Competing interests: K.S. and M.J.M. are co-inventors on a provisional U.S. patent application no. 63/092,350 entitled “Antibody specific for SARS-CoV-2 receptor binding domain and therapeutic methods.” The other authors declare no competing interests. Data and materials availability: All data associated with this study are present in the paper or the Supplementary Materials. Serologic data have been deposited in the NCI SeroNet hub ( Protein and antigen sequences are available in (8). Statistical code is available at: This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using this material.

Stay Connected to Science Translational Medicine

Navigate This Article