Immune system development varies according to age, location, and anemia in African children

See allHide authors and affiliations

Science Translational Medicine  05 Feb 2020:
Vol. 12, Issue 529, eaaw9522
DOI: 10.1126/scitranslmed.aaw9522

A global immune landscape

Systems biology approaches to the immune system have revealed specific phenotypes associated with disease or response to vaccination. However, these studies are concentrated in certain areas of the world, and the findings may not apply globally. Hill et al. studied longitudinal blood samples from a large malaria vaccine trial conducted in different African countries. They saw country-specific trajectories of immune development that influenced vaccine responses. Anemic children showed dampened adaptive immune responses. Follow-up experiments with B cells in vitro revealed that iron has direct effects on B cell differentiation. These results show that geographic location has an important influence on immune development.


Children from low- and middle-income countries, where there is a high incidence of infectious disease, have the greatest need for the protection afforded by vaccination, but vaccines often show reduced efficacy in these populations. An improved understanding of how age, infection, nutrition, and genetics influence immune ontogeny and function is key to informing vaccine design for this at-risk population. We sought to identify factors that shape immune development in children under 5 years of age from Tanzania and Mozambique by detailed immunophenotyping of longitudinal blood samples collected during the RTS,S malaria vaccine phase 3 trial. In these cohorts, the composition of the immune system is dynamically transformed during the first years of life, and this was further influenced by geographical location, with some immune cell types showing an altered rate of development in Tanzanian children compared to Dutch children enrolled in the Generation R population–based cohort study. High-titer antibody responses to the RTS,S/AS01E vaccine were associated with an activated immune profile at the time of vaccination, including an increased frequency of antibody-secreting plasmablasts and follicular helper T cells. Anemic children had lower frequencies of recent thymic emigrant T cells, isotype-switched memory B cells, and plasmablasts; modulating iron bioavailability in vitro could recapitulate the B cell defects observed in anemic children. Our findings demonstrate that the composition of the immune system in children varies according to age, geographical location, and anemia status.


Age shapes the composition and function of the human immune system. In particular, the first 5 years of life are a pivotal time in immune development: After birth, the immune system changes as thymic output increases, and maternally derived immune mediators such as antibodies are lost (1, 2). This occurs in parallel with the acquisition and cultivation of the commensal microbiome, encounters with food antigens, and exposure to numerous environmental microbes (1). These factors, combined with an individual’s genetics, shape the human immune system (36). In this early period of immune development, the functional capacity of the immune system appears to be limited, resulting in a reduced ability to generate protective cellular and humoral immunity after vaccination and an increased susceptibility to infectious diseases (2). This results in higher rates of morbidity and mortality from infectious disease in young children worldwide: In 2017, pneumonia, diarrhea, and malaria caused ~1.5 million deaths in children under 5 years of age (7). These infectious diseases disproportionately affect children in low- and middle-income countries (LMICs), particularly those in sub-Saharan Africa (7, 8). A higher pathogen burden, suboptimal nutrition, impaired maternal health, and poor access to health care undoubtedly contribute to this increased burden of infectious disease. However, there is limited understanding of how these factors affect the development and function of the immune system during childhood in LMICs and how immune system development differs between populations around the world.

In-depth immune phenotyping of peripheral blood samples from large cohorts provides a way to characterize which factors influence the composition of the human immune system. Such studies have estimated that between 20 and 40% of this interindividual variation is driven by genetics, demonstrating that nongenetic factors play a dominant role in molding the composition of the immune system (3, 911). Determining the contribution of nongenetic factors to human immunity is highly complex because of the wide range of potential factors and their putative interactions. Nevertheless, an individual’s age, their environment, and chronic viral infections have emerged as key factors that influence human immune variation over time (36, 9). Although these immunophenotyping studies have increased our understanding of the factors that influence the composition of the human immune profile, it has been difficult to integrate this knowledge in the context of immune function.

This study aimed to characterize how age and other nongenetic factors alter the composition of the immune system of young children living in Tanzania and Mozambique and to link immune status to vaccine responsiveness. Comprehensive immune profiling and transcriptomic analysis was performed on peripheral blood samples collected over a 32-month period from infants and children that were under 5 years of age and participants in a phase 3 study of the malaria vaccine RTS,S/AS01E (12). This study also aimed to compare immune composition to Dutch children from the Generation R Study (5).


Blood leukocyte subsets are stable 1 month after vaccination

First, we compared peripheral blood mononuclear cell (PBMC) samples from the RTS,S trial by flow cytometry. In the RTS,S trial, conducted across seven countries in sub-Saharan Africa, participants were vaccinated three times at monthly intervals with RTS,S/AS01E or a comparator vaccine, followed by a booster dose of vaccine at month 20 (12). PBMCs from participants from Tanzania and Mozambique collected at baseline, 3, 21 and 32 months into the trial (Fig. 1, A and B, and tables S1 and S2) were analyzed by flow cytometry using both the unsupervised T-distributed stochastic neighbor embedding (tSNE) method (fig. S1) and by manual gating to identify well-characterized innate and adaptive immune cell populations (figs. S2 to S5, and tables S3 to S5). Multidimensional scaling (MDS) analysis of tSNE data from Tanzanian children (5 to 17 months old at enrollment, n = 116) revealed that the immune profile of participants did not differ between vaccine groups at either 3 months (B3) or 21 months (B21) after vaccination (Fig. 1, C and D). The frequencies of individual tSNE clusters or manual gated populations were not different between participants given the RTS,S or a comparator vaccine at any time point sampled (B0, B3, B21, and B32; Fig. 1, E to H) nor did the type of vaccine administered or the sex of participants substantially contribute to the interindividual variance observed for immune cell types at any time point (fig. S6, A to D). In addition, no cell types or tSNE clusters were changed between baseline and B3 for any group (Fig. 1, E and F). Therefore, no vaccine-induced changes to blood immune cells were detected 1 month after vaccination, indicating that neither the RTS,S nor comparator vaccine caused lasting changes to the participant’s immune profile and that children’s immune profiles are stable over a short time scale of 3 months. Likewise, reanalysis of blood transcriptomes from clinical trial of the RTS,S vaccine in adults from North America (13) showed that minimal vaccine-induced gene expression changes remained 21 to 28 days after vaccination (fig. S6E). These findings are consistent with previous studies showing that changes in the cellular composition of the adult immune system after vaccination occur in the first 2 weeks after vaccination, followed by a return to baseline (4, 14). However, comparison of the B0 with B21 samples identified 76 tSNE clusters and 25 manual gated cell types that were altered during this time frame (all vaccine groups combined; Fig. 1, G and H), indicating that the immune landscape of children changes over a 21-month period. Consistent with this, fewer strong correlations of cell subset frequencies were observed between baseline and 32 months than between baseline and B3 (fig. S7 to S9). This change over a short period is unique to children because adults have stable immune profiles over a 2- to 6-year period (3, 4).

Fig. 1 No changes to blood leukocyte subsets are detected 1 month after third vaccination.

(A) The timeline for vaccinations and blood sampling during the vaccine trial follow-up period. The three vaccination groups are shown (R3R, R3C, and C3C). Vaccinations are shown in blue (R = RTS,S; C = Comparator). Comparator vaccine differed between age groups (Menjugate for infants, Verorab for children). Red numbers indicate the month of follow-up at which blood samples were collected. (B) The distribution of participant age (in months) at each of the blood sampling time points for Tanzanian children (n = 414 total samples). Nonmetric multidimensional scaling (NMDS) for Tanzanian children at blood sample 3 (C) or blood sample 21 (D) based on the 336 tSNE clusters and vaccine group (B3: R3R and R3C, n = 63; comparator, n = 17; B21: C3C, n = 18; R3C, n = 39; R3R, n = 44). Samples from R3R and R3C groups were combined at B3 because identical vaccines had been received. The frequencies of 336 tSNE clusters (E) (as a proportion of each subpopulation) or 70 manually gated cell subsets for each participant (F) (as a percentage of all leukocytes) were compared between vaccine groups at B0 and B3 and within groups at B3 relative to B0 (all = all Tanzanian children, n = 80). Between-group comparisons were analyzed with the Mann-Whitney test, and B0 versus B3 used paired data and Wilcoxon signed-rank test. The −log10 P value is plotted for each cluster or subset (FDR adjusted for multiple comparisons). tSNE clusters (G) or manual gated cell subsets (H) were compared within vaccine group at B21 and in all children (pairwise) at B21 relative to B0. In (E) to (H), any population with an adjusted P value of less than 0.01 were deemed significantly different; the dashed lines indicate Padj = 0.01.

Dynamic development of the immune system occurs during childhood

To determine how age affects the blood immune cell composition in the first years of life, we compared the immune profile at B3 of infants (4.8 to 5.8 months old, n = 43) and children (7.5 to 22 months old, n = 55) from Mozambique (Fig. 2A). The two age groups separated from each other by the first MDS dimension of tSNE clusters (P < 10−16, Fig. 2B), and manual gating analysis revealed that 33 immune subsets were different between the age groups (Fig. 2C). These results, and regression modeling (Fig. 2D), demonstrate that age was the biggest contributor to immune variation between these participants. For example, naïve B cell and CD25+CD127low regulatory T cell (Treg) cell frequencies were elevated in infants compared to children (Fig. 2E). Reciprocally, frequencies of CD4+ T cell memory and T helper cell subsets were increased in children (Fig. 2E). Of note, circulating T follicular helper (cTFH) cells and plasmablasts were more abundant in children (Fig. 2E), cell types that are known to increase transiently after infection or vaccination (4, 14). This could be because children had higher exposure to infections and vaccinations than infants, and these cells act as circulating biomarkers of an ongoing immune response in secondary lymphoid tissues (15). Alternatively, these data may indicate that children are more capable than infants of mounting the cellular immune response required for the production of humoral immunity. Consistent with the latter hypothesis, infants had reduced immunoglobulin G (IgG) titers to the RTS,S vaccine components hepatitis B surface antigen (HBV.S) and Plasmodium falciparum circumsporozoite surface protein (CSP) compared to children in the trial (Fig. 2F and fig. S10, A and B).

Fig. 2 Dynamic development of memory B and T cells occurs between 6 months and 2 years of age.

(A) The age (in months) of infant and children age groups from Mozambique during study follow-up. Red box indicates the B3 samples (n = 37 infants, n = 52 children). (B) NMDS for children and infants at B3 based on the 336 tSNE clusters, with the coordinates for each dimension compared by Mann-Whitney test. (C) Heatmap showing the manually gated cell subsets (as a percentage of all leukocytes) with an FDR-adjusted P value of less than 0.001 between children and infants at B3 by Mann-Whitney test. Cell types are shown as rows, and each column represents an individual participant. Rows and columns were clustered using Euclidean distance, and the cell subset frequency was converted to z-scores. Individuals without any missing values are shown (n = 22 infants, n = 30 children). (D) The contribution of sample age group and sex to the cell type variance as determined from linear regression models, shown as a boxplot summary of the 33 cell types shown in (C) (n = 89 samples). (E) Representative cell type frequencies (as a percentage of parent population) for infants and children, with a P value from Mann-Whitney test (FDR adjusted for 70 subsets; n = 37 infants, n = 52 children). (F) Anti-HBV.S or anti-CSP IgG titer [B3 − B0 arbitrary units (AU)] from a cohort of RTS,S vaccinated participants from Mozambique (n = 198 infants, n = 137 children), with a P value from Mann-Whitney test. The 336 tSNE clusters (G) or 70 manually gated cell subsets (H) were compared between infants and children at B3, B21, and B32, and the −log10 FDR-adjusted P value is shown, with dashed lines indicating Padj = 0.01.

Further longitudinal analyses revealed that the immune profile of infants and children continued to change over the 32-month duration of the vaccine trial (Fig. 2G), indicating ongoing immune development throughout early childhood. To investigate the immune ontogeny over this longer time period, we used the larger cohort of samples collected over 32 months from children in Tanzania (n = 116, Fig. 3A), which confirmed that age is a major driver of immune variation in this cohort (Fig. 3B). To determine how specific immune cell types change during the first years of life, sample age and sex were modeled against cell type frequency using linear mixed-effect regression (LMER) (age range, 4.8 months to 4.3 years; Fig. 3C). This identified 20 cell types that changed in the first years of life in Tanzanian children with age, including increases in Ig isotype–switched memory B cells, T helper cells, and dendritic cell subsets with age (Fig. 3, C and D).

Fig. 3 Dynamic changes of the immune landscape during early childhood.

(A) The age in months of Tanzanian children at blood samples throughout the follow-up period (n = 414 in total). (B) NMDS for children at B0 (n = 78) and B32 (n = 109) based on the 336 tSNE clusters, with the coordinates for each dimension compared by Mann-Whitney test. Colors correspond to time point (B0, white; B32, black). (C) Heatmap of cell subsets that were significantly different between B0 and B32 in Tanzanian children (P < 0.0001 and conditional R2 > 0.25), with the frequency predicted from LMER models shown normalized to week 20 frequency. P values for the effect of sex and age determined by ANOVA of LMER models after FDR adjustment are shown in gray boxes, and the marginal and conditional R2 of LMER model are shown in purple for each cell subset. n.s., not significant. (D) The frequency of four representative cell subsets (as a percentage of parent population) plotted against age for each sample, with LMER modeling (red lines). Marginal R2 and P value for fit of LMER models shown in boxes above each plot.

The change in the composition of the immune system as children get older was recapitulated using RNA sequencing (RNA-seq) of whole PBMCs from Tanzanian children from B0 (age, 5.3 to 17.2 months; n = 21) and B32 (age, 37 to 50 months; n = 9) (Fig. 4, A and B, and data file S1). Blood transcriptional module (BTM) analysis (16) identified gene signatures of cell division, CD4+ T cell proliferation, and B cells as enriched in samples from younger children (B0; Fig. 4C). An erythrocyte differentiation signature was more abundant in younger children, indicating the presence of nucleated red blood cells or reticulocytes that are common in infants and are retained during density centrifugation of whole blood. In older children (B32), signatures of dendritic cell, natural killer (NK) cell, and CD4+ T cell activation were enriched, along with hallmarks of nuclear factor κB and activator protein (AP-1) activation and cytokine signaling, suggesting a higher state of immune activation (Fig. 4D). These data demonstrate that the steady-state blood transcriptome in children is strongly influenced by age, indicative of the profound changes to the immune system that occur during this early period in life.

Fig. 4 Blood gene expression signatures reflect innate and adaptive immune cell changes during early childhood.

(A) The age (in months) of the Tanzanian children at blood samples B0 and B32 for which RNA-seq was performed. Dots represent individual samples, B0 (n = 21) and B32 (n = 9). (B) Hierarchical clustering of 2146 differentially expressed (DE) genes between B0 and B32 (1148 up-regulated at B0, 1008 up-regulated at B32). Significantly different genes had an FDR-adjusted P value of <0.01 from DESeq2 and an FDR-adjusted P value of <0.05 from the intensity-difference test and were clustered using Euclidean distance. Each column represents an individual sample, and genes are shown as rows. Enriched blood transcriptional modules (BMTs) at (C) B0 and (D) B32, represented as colored circles with annotations as text. Colors and connecting lines show gene sets with related genes, and summary are terms shown in text boxes. BTMs enriched from within all expressed genes (17,607) with an enrichment score greater than 1.5 or less and −0.5 and an FDR-adjusted P value of <0.01 (Kolmorogov-Smirnov test) are shown. (E) A stylized representation of the multi-omics factor analysis (MOFA) modeling, which was built using 5000 most variable genes from RNA-seq and 61 immune cell types measured by flow cytometry for 27 samples (19 B0, 8 B32) and reduces these 5061 parameters to latent factors (LFs). (F) The percentage of total variance (R2) in the MOFA model explained by flow cytometry or RNA-seq datasets. (G) Scatter plot showing the distribution of samples by the first two LFs determined by MOFA modeling. Each sample shown is as a circle (B0, white; B32, black). The percentage of total variance and the contribution of flow and RNA to each factor are indicated in the axes labels. (H) The correlation between the age in weeks for each sample and the sum of values for LF1 and LF2 (factor sum; B0, white; B32, black). Linear regression fit shown by the red line, and R2 and P value from Pearson correlation are shown. The proportional loading for the 12 cell types with the strongest contribution to (I) LF1 or (K) LF2 are shown. The P values (−log10 FDR-adjusted) for BTMs that were significantly enriched (P adjusted < 0.05) among genes identified by (J) LF1 and (L) LF2 and corresponding enrichment scores. (I to L) Color and symbol representing the direction of loading or enrichment in factor space (blue, negative; red, positive).

In addition to age, genetics and other extrinsic factors can contribute interindividual variation in the immune landscape (9). Because many of the age-related changes to immune cell types and gene expression identified in this cohort were highly heterogeneous between individuals, we sought to quantify the contribution of age to immune variation during this dynamic time frame in young African children. For this, we applied multi-omics factor analysis (MOFA) to our flow cytometry and RNA-seq data (Fig. 4E), which is an unsupervised method that can deconvolve the sources of heterogeneity in diverse data types (17, 18). The MOFA model attributed the variance in the whole dataset evenly between the RNA-seq and flow cytometry assays (Fig. 4F). Age clearly separated by the first two MOFA factors, which combined, explained 46.2% of the total variance in the dataset, and the first two factors combined strongly correlated the age of each sample (Fig. 4, G and H). MOFA showed that a decrease in naïve B cells and an increase in activated dendritic cells, effector memory CD4+ T cells, and memory B cells in flow cytometry and transcriptomic data best explained the age-associated immune variance (Fig. 4, I to L). This analysis shows that age has a major influence on the immune landscape in the first years of life, with T and B lymphocytes and dendritic cells showing the most dynamic changes.

Comparison of Tanzanian and Dutch children reveals that age and location associate with differences in the immune landscape

To investigate whether these changes over time are characteristic of human immune ontogeny broadly or whether there is an immune signature that is specific to childhood in Tanzania, we reanalyzed published data from a longitudinal immunophenotyping study of Dutch children that were followed up from birth to 6 years of age [Generation R cohort (5)]. Children from the same age range in both studies (20 to 125 weeks of age) and the 19 cell subsets that were measured with the identical marker combinations in both cohorts were included in the analysis (Fig. 5A). LMER modeling of immune cell frequencies against age and sex revealed that some immune cell types were similar between cohorts, but some subsets were influenced by the child’s country of origin (Fig. 5, B and C): Vδ2+ cells and CD8+ T effector memory cells showed the same pattern in both cohorts, but the accumulation of memory cells as a proportion of CD4+ and CD8+ T cells differed between Dutch and Tanzanian children. In particular, the frequencies of CD4+ T effector memory cells in Dutch children under 1 year of age were lower but had converged with Tanzanian children by 2 years of age (Fig. 5, B and C). A higher frequency of total memory B cell subsets was also observed in Tanzanian children; however, the abundance of Ig isotypes was variable between the CD27 and CD27+ memory B cell subsets (Fig. 5, B and C). Tanzanian children showed a shift toward more IgM- and fewer IgG- and IgA-expressing cells in the CD27 compartment. In contrast, frequencies of CD27+IgG+ and CD27+IgM+ cells were higher in Tanzanian children (Fig. 5, B and C). Together, these data suggest that some immune cells mature as part of a common childhood immune maturation program, but others may be distinct in Tanzanian and Dutch children.

Fig. 5 Comparison of Tanzanian and Dutch children reveals that age and location are linked with changes in the immune landscape.

(A) The age in months for Tanzanian samples relative to Dutch cohort. The green box indicates the 20- to 125-week age range used for age-matched LMER modeling, with 176 total samples from 102 Tanzanian children and 748 total samples from 504 Dutch children. The purple box represents the birth to 414-week age range used for immune trajectory building. (B) Cell subsets that were measured in both cohorts and were significantly altered with age in either cohort are shown as the frequency predicted from LMER models, normalized to the Tanzanian cohort week 20 levels. P values determined by ANOVA for the effect of sex and location (Tanzanian or Dutch cohorts) and for the interaction between age and location are shown in gray boxes, with FDR adjustment. (C) The frequency (percentage of B cells, CD4, or CD8 T cells) of representative cell subsets are shown from 20 to 125 weeks of age for the Tanzanian (gray) and Dutch samples (blue). Solid lines represent respective LMER models, with P values from ANOVA for location, age, and sex effects shown in boxes below (FDR adjusted for the 19 of cell subsets). (D) Diffusion map dimensionality reduction of Dutch and Tanzanian samples using scaled cellular frequencies and the diffusion-pseudotime algorithm. Each dot represents a sample, color represents the pseudotime output values, and the red arrow indicates the direction of the trajectory that starts with the Dutch newborn samples. First panel shows all samples used for building the trajectory (421 total samples from 157 Tanzania children, 1801 total samples from 1119 Dutch children), and second and third panels show Dutch and Tanzanian samples in the 20- to 125-week age range, respectively, used in LMER models in (B) and (C). The correlation coefficients (R2) for the 18 cell types that significantly correlated with the pseudotime are shown as colored boxes. (E) The distribution of naïve and memory B cell frequencies along the diffusion map trajectory are shown for 2222 samples, with color representing the scaled cell type frequency. (F) Age (in weeks) and (G) corresponding pseudotime age for samples in the 20- to 125-week age range shown in (D), with P values determined by Mann-Whitney test.

A common trajectory of immune development in neonates has been proposed from studies of newborns in Europe, West Africa, and Australasia (6, 19). To determine whether Tanzanian and Dutch children follow a similar path of immune development, we applied a diffusion-pseudotime algorithm (20) to the immunophenotyping data that spanned birth to 8 years of age. Children’s immune profiles were distributed along a continuous spectrum that originated with samples from newborn Dutch children and progressed with age (Fig. 5D and fig. S11). The two cohorts clustered together and followed the same pseudotime path (Fig. 5D), which has been reported to indicate a shared trajectory that is not influenced by technical differences between datasets (21). A pseudotime measure of immunological age was determined for each sample relative to their position along the trajectory, and this position was strongly correlated with a transition from naïve to memory lymphocytes (Fig. 5, D and E). Twenty- to 125-week-old children from Tanzania had higher pseudotime values than Dutch counterparts of the same age range (Fig. 5, F and G). These data suggest that immune cell types develop along a common pathway throughout childhood, but the rate at which these changes occur can vary between countries.

Immune development during childhood proceeds differently in Tanzania and Mozambique

The observation that the immune landscape in children is influenced by location at the continental level prompts the hypothesis that there may be differences in immune development between African countries. Children from Mozambique (n = 55) clustered distinctly from children of the same age from Tanzania (n = 116) by both MDS dimensions of tSNE clusters (B0 samples; Fig. 6A). We identified 31 immune subsets that varied between the sites, and, apart from total and naïve B cells, the altered cell types showed lower frequencies in children from Tanzania. These cells came from diverse immune lineages, including NK cells, monocytes, and memory B and T cells (Fig. 6B). The influence of the different African sites on immune composition was present at baseline (B0) and continued throughout the 32-month study period (Fig. 6C). At baseline, children from Mozambique had increased frequencies of activated monocytes, activated CD4+ and CD8+ T cells (HLADR+CD38+), and T helper cell subsets compared to children from Tanzania (Fig. 6D). This was accompanied by elevated cTFH cell and plasmablast frequencies at baseline (Fig. 6D). The increase in cTFH cells and plasmablasts in children from Mozambique was linked to higher IgG titers to both components of the RTS,S vaccine (Tanzania, n = 123; Mozambique, n = 137; Fig. 5E and fig. S12, A and B). This decrease in antibody responses to the RTS,S vaccine in Tanzanians compared to Mozambicans was validated in an independent subcohort of infants from the phase 3 trial for which serum antibody titers were measured (fig. S12, C and D), indicating that country-specific differences are present from early in life. Despite the close proximity of the two countries within East Africa, our results demonstrate that country of origin alters immune development in children.

Fig. 6 Childhood immune development is influenced by location within Africa.

(A) NMDS for children at B0 from Tanzania (n = 78) and Mozambique (n = 30) based on the 336 tSNE clusters, with the coordinates for each dimension compared by Mann-Whitney test. (B) Heatmap showing manually gated cell subsets (as a percentage of all leukocytes) with an FDR-adjusted P value of less than 0.01 between each site at B0 by Mann-Whitney test. Cell types are shown as rows, and each column represents an individual sample. Rows and columns were clustered using Euclidean distance, and the cell subset frequency was converted to z-scores (Mozambique, n = 19; Tanzania, n = 86). (C) The 336 tSNE clusters or 70 manually gated cell subsets were compared between samples from Tanzania and Mozambique at B0, B3, B21, and B32 by Mann-Whitney test. The −log10 FDR-adjusted P value is shown, with the dashed line indicating Padj = 0.01. (D) Representative cell type frequencies for each site (as a percentage of parent population), with a P value from Mann-Whitney test (FDR adjusted for 70 subsets). (E) The increase in anti-HBV.S or anti-CSP IgG (B3 − B0 AU) at B3 (B3 − B0) from a larger cohort of RTS,S-vaccinated participants from each site (Mozambique, n = 137; Tanzania, n = 123), with a P value from Mann-Whitney test. (F) The age of children for immunophenotyping samples from each site (Tanzania, n = 33; Mozambique, n = 94). (G) The contribution of sample age, sex, and country (Tanzania or Mozambique) to the cell type variance as determined from linear regression models, shown as a boxplot summary of the 31 cell types shown in (B). (H) Weight and (I) height (z-scores, age-adjusted according to global reference values) for each site (Tanzania, n = 359; Mozambique, n = 974). (J) Blood hemoglobin concentration (g/dl) for each site, with dashed lines representing the various disease categories (Tanzania, n = 359; Mozambique, n = 974).

Within this study, age was not different between the two sites (Fig. 6F), and despite a trend toward Mozambican children being older, age and sex made a minimal contribution to the country-specific variance observed (Fig. 6G). Weight did not differ between these two cohorts (Fig. 6H), and 55% of participants were male at both sites, although Tanzanian children were slightly taller for their age (Fig. 6I). In our transcriptomic dataset from Tanzanian children at B0, we observed enrichment for gene signatures of erythrocyte maturation and oxygen transport, which suggests that anemia may affect the blood transcriptome. This led us to speculate that anemia may influence the geographic differences observed in our study, and we observed lower hemoglobin concentration in children from Tanzania compared to Mozambique (Fig. 6J). This was also reflected in the higher frequency of children characterized as having moderate or severe anemia in Tanzania (60%) compared to Mozambique (44%), based on World Health Organization definitions (22). This prompts the hypothesis that anemia may be one of the factors influencing immune composition in children.

Anemia is linked with an altered composition of the immune system in children, and reduced iron bioavailability impairs plasmablast generation

To investigate whether anemia influences the immune landscape, we compared gene expression and immune profiles in blood samples between age-matched anemic [hemoglobin (<8.5 g/dl), n = 6] and nonanemic [hemoglobin (>10.5 g/dl), n = 9) children from Tanzania and Mozambique subsampled from within our dataset based on hemoglobin concentration (Fig. 7A). Anemia status influenced the blood transcriptome (Fig. 7B and data file S2), with a strong enrichment of an erythrocyte differentiation and heme biosynthesis gene signature in anemic children (Fig. 7C), indicative of an increase in reticulocytes as a compensatory mechanism during anemia. In nonanemic children, blood transcription modules of mitosis, CD4+ T cell differentiation, B cells, and plasma cells were enriched (Fig. 7D), suggesting that anemia is associated with changes to the cellular composition of the immune system. Anemia contributed substantially to the immune variation (Fig. 7E), and the immune cell types altered in anemic children mirrored the transcriptomic analysis; anemic children had fewer recent thymic emigrant CD4+ T cells, IgG memory B cells, and plasmablasts than nonanemic children at B0 (Fig. 7F). In this cohort, the anemic status of children changed over time, enabling us to evaluate immune phenotypes in older children who developed anemia later in life. The immune cell types affected by anemia at B0 were also reduced in anemic children at B21 (Fig. 7G), none of which were anemic at B0.

Fig. 7 Anemia is linked to changes in immunophenotype and iron availability directly impacts B cell biology.

(A) The age of anemic [hemoglobin (<8.5 g/dl), n = 43] and nonanemic [hemoglobin (>10.5 g/dl), n = 52] from Tanzania and Mozambique at B0. Black squares indicate samples where RNA-seq was performed on whole PBMCs in addition to immunophenotyping. (B) Hierarchical clustering of 376 DE genes between anemic (n = 6) and nonanemic (n = 8) Tanzanian children. Significantly different genes had an FDR-adjusted P value of <0.01 from DESeq2 and an FDR-adjusted P value of <0.05 from the intensity-difference test and were clustered using Euclidean distance. Each column represents an individual sample, and genes are shown as rows. Enriched BMTs in (C) anemic and (D) nonanemic children, represented as colored circles with annotations as text. Colors and connecting lines show gene sets with related genes, and summary terms are shown in text boxes. BTMs enriched from within all expressed genes (17,607) with an enrichment score greater than 1.5 or less and −0.5 and an FDR-adjusted P value of <0.01 (Kolmorogov-Smirnov test) are shown. (E) The contribution of sample age, anemia status, and sex to the cell type variance as determined from linear regression models. Representative cell subset frequencies that were significantly different between anemic and nonanemic children at (F) B0 (nonanemic, n = 52; anemic, n = 43) and (G) B21 (nonanemic, n = 99; anemic, n = 20), with a P value from Mann-Whitney test. (H to M) Purified B cells from U.K. adults were stained with CellTraceViolet and incubated in culture media with interleukin-21 and CD40L (CM) for 5 days. Some cultures were supplemented with either apo-transferrin (APO) throughout or with ciclopirox olamine (CPX) after 24 hours of culture. (H) The percentage and number of live B cells at days 0 and 5 in CM, APO, and CPX culture conditions. (I) Representative histogram of cell trace violet on day 5 (black, CM; red, APO; blue, CPX) and (J) the percentage of cells that have undergone one or more divisions on day 5. (K) Plasmablast gating strategy (live IgDCD27+CD38+IRF4+ B cells). (L) The percentage of plasmablasts and fold change (FC) relative to CM alone. (M) The concentration of IgG in culture supernatants as measured by ELISA and fold change relative to CM culture conditions. (H to M) P values were determined using Holm-Sidak’s multiple comparison testing, except for fold change analyses for which a one-sample t test was used. Representative of three independent experiments with six to eight U.K. adult blood samples.

The immunophenotyping data suggested that induction of plasmablasts and memory B cells, important cell types for the protective immunity afforded by vaccination, are negatively affected in children with anemia. We sought to determine whether the availability of iron could explain the B cell defects observed in anemic children by modulating iron bioavailability during plasmablast differentiation in vitro. Cellular iron was altered by supplementing media with apo-transferrin (APO), which facilitates iron uptake by lymphocytes, and by limiting intracellular iron using a cytostatic dose of the iron chelator ciclopirox olamine (CPX; Fig. 7H and fig. S13). Increased cellular iron enhanced B cell proliferation, and conversely, proliferation was impaired when iron availability was reduced (Fig. 7, H to J). Likewise, increased bioavailability of iron enhanced plasmablast differentiation (Fig. 7, K to L), which was mirrored by an increase of class-switched antibody produced in these cultures (Fig. 7M). These data show a direct effect of bioavailable iron on B cell biology in vitro and suggest that dietary iron deficiency or infection may impair development of adaptive immunity in anemic children.


In this study, we show that the composition of the immune system in children from sub-Saharan Africa undergoes dynamic changes in the first years of life. This maturation profile differs from that of Dutch children sampled over the same period of early life, suggesting that the rate of immune maturation is influenced by country-specific factors. Consistent with this finding, location within Africa also had a large effect on the composition of the immune system and was linked to differential responses to the RTS,S/AS01E vaccine. Groups of children who responded with higher antibody titers to the RTS,S vaccine had circulating cells indicative of a more receptive immune system at the time of first vaccination, including increased baseline frequencies of antibody-secreting cells and cTFH cells. To determine what underpins this productive immune profile, we examined several factors and found that anemia was linked to poor baseline frequencies of several immune cell types, including memory B cells and plasmablasts. We were able to show a direct link between plasmablast formation and antibody production driven by the bioavailability of iron in vitro, demonstrating a direct effect of iron on B cell responsiveness. This indicates that anemia, which is common in children from LMICs, is associated with an altered composition of the immune system in the first years of life. Together, this study shows that the development and function of the immune system in children from sub-Saharan Africa vary according to their age, geographic location, and by comorbidities and nutritional status that cause anemia.

The factors that underpin the difference in immune profiles from children living in Tanzania and Mozambique are likely to be multifactorial; with contributions from genetics, climate, diet, sanitation, maternal health, and the differential prevalence of infectious diseases. We observed that children from Mozambique had a more “activated” immune profile than children from Tanzania at baseline. This corresponded to increased antibody production after RTS,S vaccination in children from Mozambique. These findings are consistent with the protective efficacy of Rotavirus vaccination being highly variable between different African sites (23). Disentangling the contributions of environment or genetics at a population level is complex, although twin studies in Gambian infants suggest that environmental factors predominantly influence antibody persistence after vaccination (24). Our findings suggest that an individual’s immunological profile at time of first vaccination may affect the ensuing antibody response and are not consistent with suggestions that a highly activated immune system in individuals from LMICs prohibits good responses to vaccination (2527). Children from sub-Saharan Africa showed altered immune development with age compared to Dutch children and had accumulated memory/effector lymphocytes at earlier years in life, indicative of advanced immunological age relative to biological age. Viral pathogens may underpin the immune development differences between Tanzanian, Mozambican, and Dutch children, because infection with Epstein-Barr virus and cytomegalovirus and HIV exposure have been linked to changes to several of the T and B cell memory populations observed to change in this study (9, 2830). Differences in the childhood vaccination programs could contribute to country-specific immune landscapes, as the Bacille Calmette-Guérin vaccination, given to African but not Dutch newborns, is known to have nonspecific effects on innate immunity and lymphocytes (31). It is remains unclear how these and other factors contribute to the reduced responses to rotavirus, cholera, and polio vaccines in infants from LMICs compared to Europe and North America (23, 3238); however, our study indicates that reduced vaccine responses in LMIC children do not result from a delayed or impaired immune development relative to high-income countries.

The prevalence of anemia in preschool-aged children in sub-Saharan Africa is over 65% and is a major public health problem due to its association with increased risk of death and impaired cognitive development (22). In this study, children from Tanzania were more likely to have moderate to severe anemia than children from Mozambique, and anemia was associated with changes to the PBMC transcriptome and reduced frequencies of CD4+ T cells, IgG+ memory B cells, and plasmablasts. The major drivers of childhood anemia are iron and folate deficiency, infectious disease (e.g., malaria, HIV, bacteremia, and schistosomiasis), and inherited hemoglobinopathies (22). It is possible that infectious diseases are responsible for shaping the immune system in anemic children, but it is also possible that anemia directly affects the composition and responsiveness of the immune system. After infection or immunization, B and T cells rapidly proliferate, and although it is known that iron plays a critical role in both cellular metabolism and DNA synthesis (39, 40), the precise role that iron plays in adaptive immune responses remains unclear (41). We show that sufficient bioavailable iron is needed for optimal production of plasmablasts and IgG responses by human B cells in vitro, which is consistent with in vivo work in mice (4244). In humans, homozygous mutation of transferrin receptor 1 causes a primary immunodeficiency, characterized by hypoglobulinemia and defects in isotype-switched memory B cells (45). Together with the data presented here, this supports the notion that the low-iron state of anemia may affect plasmablast differentiation directly. This impairment in B and T cell responses could explain the poorer vaccine outcomes that have been reported in anemic children and iron-deficient adults (44, 46), although further studies are needed to dissect the contributions of iron deficiency, genetic polymorphisms, and co-infections to the reduced plasmablast differentiation that we observed in this study.

Childhood vaccination programs, such as the Expanded Program on Immunization, save millions of lives worldwide by preventing disease caused by pertussis, measles, polio, diphtheria, tuberculosis, and tetanus (47). For many of these early vaccination schedules, several vaccine doses are administered within the first 12 weeks of life. In this study, IgG responses to the RTS,S vaccine were much lower in 6- to 12-week-old infants than children over 5 months of age, consistent with responses to other vaccines given in early life (1, 48). Infants 3 months into this trial had an “inactive” immune phenotype, consisting of low frequencies of memory B cells, memory CD4+ T cell subsets, and an elevated Treg cell frequency. It has been reported that neonatal CD4+ T cells tend to differentiate into Treg cells (49, 50), which could explain the higher Treg frequency and could promote tolerogenicity of antigen-presenting cells and thus limit T cell responses in infants. Tfh cell generation, germinal center formation, and plasma cell survival are impaired in neonatal mice (48, 5155), and here, we show that human infants show reduced cTFH and plasmablast differentiation compared to older children. These immune composition differences likely underpin the reduced responses in infants to the RTS,S vaccine, which, based on the phase 3 trial results, has been licensed only for use in children at 5 to 17 months of age (56). Our findings may also be applicable to other vaccines and suggest that a “one size fits all” approach in vaccinology is suboptimal. It is conceivable that the optimal age range to vaccinate infants and children would need to be determined for each vaccine and location, because American infants with a more mature adaptive immune system elicit stronger responses to diphtheria-tetanus-pertussis vaccine (57) but weaker responses to measles-mumps-rubella immunization (58).

The data presented here suggest that age, location, and anemia may alter the rate of immune development in children. Like all large-scale human immunology studies, however, there are limitations: Collection and immune phenotyping of the Dutch cohort was independent of the RTS,S vaccine trial; because of this, it was not possible to determine how differences at the continental level relate to immune system function. Within the RTS,S trial, country-specific differences in vaccine-specific antibody responses were observed across the seven African countries (59); however, PBMC samples were only available from two countries for this study. An understanding of what underpins the geographical changes in immune function will require larger-scale cohorts that include multiple countries. Because of the low malaria incidence in Tanzania and Mozambique during the RTS,S vaccine trial, it was not possible to investigate how immune status related to vaccine efficacy and immune function in this cohort. Further, it will be important to understand whether differences in immune profile relate to changes in susceptibility to infectious disease. Future studies will clearly be of value to elucidate the influence of genetics, the microbiome, pathogens, and nutrition on immune development, anemia, and vaccine efficacy in different populations around the globe.

Despite the success of childhood immunizations to date, there are still numerous infectious diseases, such as malaria, that require a vaccination solution. It is likely that such solutions will come from a combination of innovative vaccine design and an understanding of the host response to vaccination. RTS,S is the most clinically advanced malaria vaccine and has now been rolled out in pilot implementation programs in areas of Ghana, Kenya, and Malawi (60), but we need to be better informed about what underpins this vaccine’s efficacy at the individual and population levels to use it effectively. The data reported here demonstrate that multiple factors influence the immune system of children in sub-Saharan Africa, with age and location having the strongest influence on immune composition, and were linked with differential responses to the RTS,S vaccine. Therefore, a better understanding of what underpins productive immune responses at early age has the potential to improve the efficacy of many childhood vaccine approaches in different populations globally.


Study design

The overall objective of this study was to investigate how the composition of the immune system changes throughout childhood in African and Dutch children by using flow cytometry and transcriptomics. This study was performed using cryopreserved PBMCs from a subset of participants the RTS,S/AS01E phase 3 trial ( NCT00866619), described elsewhere (12). Participants received three doses of RTS,S/AS01E at months 0, 1, and 2 and a booster dose at month 20 (R3R group); three doses of RTS,S/AS01E and a comparator vaccine at month 20 (R3C group); or a comparator vaccine at months 0, 1, 2, and 20 (C3C group). The trial consisted of two age categories based on participant age at enrolment that were used in this study for consistency; “infants” were 6 to 12 weeks old and “children” were between 5 and 17 months of age. The comparator vaccine for infants and children was meningococcal serogroup C conjugate vaccine (Menjugate, Novartis) and rabies (Verorab, Sanofi Pasteur), respectively. Samples analyzed here were collected in two different centers: Manhiça Health Research Center, Fundação Manhiça (FM-CISM, Mozambique) and Ifakara Health Institute and Bagamoyo Research and Training Center (IHI-BRTC, Tanzania). These areas have been described in detail elsewhere (61). Both sites had low-medium malaria transmission intensity at the time of the study (12, 62), and incidence of clinical malaria was low across all vaccine groups. Samples for this study were randomly selected from among participants for which samples were available from all time points and had no reported malaria episodes throughout the follow-up period (table S1) to exclude any influence of malaria infection on immune parameters. PBMCs from study months 0, 3, 21, and 32 were available for children (Tanzania and Mozambique) and months 3, 21, and 32 for infants (Mozambique only). A description of the number of participants and samples, sample randomization, excluded data, vaccine groups, and nested anemia subgroups are outlined in a consort table (table S2). The analysis of Dutch children involved reanalyzing immunophenotyping data from the Generation R Study, a prospective population-based cohort study from birth until young adulthood of children born between 2003 and 2006 (63, 64). For age-matched comparisons to Tanzanian children, a subset of samples from Generation R children between 20 and 125 weeks of age were analyzed (504 of 1182 children, 748 samples in total). To investigate whether iron influences B cell biology, iron bioavailability was modulated during plasmablast differentiation assays in vitro using healthy U.K. adult PBMC samples. Investigators conducted all assays, data collection, and processing blinded to vaccination group, age group, study time point, and study site.

Ethical approval

Approval for the RTS,S immunological study was obtained from the Ethical Committee of the Hospital Clínic in Barcelona (Spain), the National Health and Bioethics Committee (Mozambique), the Ethikkommission Beider Basel (Switzerland), the National Institutional Review Board (Tanzania), the Ifakara Health Institute IRB (Tanzania), and the PATH’s Research Ethics Committee (USA). Use of RTS,S samples in the United Kingdom and healthy U.K. adult PBMCs were approved by the U.K. Health Research Authority (17/EE/0063 and 15/EE/0071, respectively) and Babraham Institute Human Ethics Committee. The Generation R Study received approval from the Medical Ethical Committee of the Erasmus MC, University Medical Center (Rotterdam, The Netherlands). Written informed consent was obtained from parents or guardians of participating children in accordance with the Declaration of Helsinki.

African blood sample preparation and antibody staining

PBMCs were isolated at the African institution laboratories by density gradient centrifugation (65), cryopreserved in liquid nitrogen, and shipped to the Babraham Institute where the PBMCs were thawed and rested for 2 hours at 37°C. Cell number and viability were measured using a Nucleocounter (NC-3000, Chemometec). Samples with low viability (<50% live cells) were discarded. Two flow cytometry panels (panels 1 and 2, table S3) were used to stain between 500,000 and 2 million cells per panel. Briefly, cells were incubated with Fc receptor–blocking antibody for 30 min at 4°C, followed by staining with antibody master mixes for 1 hour at 4°C, followed by fixation with 4% paraformaldehyde (Cytofix, BD Biosciences). Fluorescence minus one controls were performed for each antibody, in each experiment. Up to 96 PBMC samples were stained per experiment (total of 718 samples over nine experiments). Participants’ longitudinal samples were tested in the same experiment, and each experiment contained a randomized assortment of samples from each study site, vaccine arm, and age group to control for batch effects.

African sample flow cytometry and data processing

Samples were acquired on a BD LSR Fortessa five-laser cytometer, with identical settings for each experiment. Eight-peak Rainbow calibration particles (Spherotech) were used to confirm consistent laser power and detector sensitivity in each experiment. Raw .fcs files were preprocessed for quality control to remove events outside the normal range for flow rate, dynamic range, and signal acquisition [Flow AI (66)]. Data files were removed of duplicates and dead cells before all analyses, followed by downstream gating for tSNE and manual gating (fig. S2 to S5). High interassay concordance of control U.K. blood samples included in each experiment showed limited batch effects in flow cytometry staining (fig. S14). For tSNE analysis, gated subpopulation data (CD3+ T cells, CD19+ B cells, CD4+ T cells, CD8+ T cells, and CD3CD19 leukocytes) were exported as .csv file, and fluorescence values were arc sin–transformed using manually defined fluorescence cutoffs and a cofactor value of 50 (fig. S1). For each subpopulation, 1000 cells from each sample were randomly subsampled to a maximum of 723,000 cells. Samples with less than 3000 events for subpopulations after FlowAI QC filters were excluded from the analysis (66). The tSNE algorithm was applied to each subpopulation independently (2500 iterations; perplexity, 100) using the RtSNE package (67), followed by clustering using the dbscan package. A total of 336 distinct cell clusters characterized by distinct combinations of surface marker expression were identified, and cluster frequencies were calculated for each sample as a proportion of the 1000 cells analyzed for each subpopulation. Manual gating of 70 well-characterized cell subsets was performed using FlowJo software (tables S4 and S5 and figs. S2 to S5). Manually gated data were imported into R using flowWorkspace package (68), and cell subset frequencies were extracted as a percentage of total leukocytes and as a percentage of CD3+, CD19+, CD4+, CD8+, or forward and side-scatter intermediate cells. For the Generation R cohort of Dutch children, existing immunophenotyping data were analyzed for the 19 cell subsets in which the antibodies and gating strategies were common for both RTS,S and GenR cohorts (5). Cell subset frequencies as a percentage of CD3+, CD19+, CD4+, or CD8+ cells were used from participants either from birth to 414 weeks of age (n = 2010 samples) or between 25 and 125 weeks of age (n = 748 samples). For all cohorts, relative frequencies (as a percentage of total lymphocytes or as a percentage of “parent” population) were used because the absolute counts for each cell type were not available.

B cell in vitro cultures and plasmablast differentiation

PBMCs were isolated from healthy U.K. adult apheresis leukocyte cones by density gradient centrifugation and cryopreserved in liquid nitrogen. PBMCs were thawed and rested for 2 hours at 37°C and stained with CellTrace Violet cell proliferation dye (Thermo Fisher Scientific), and then, B cells were enriched using a MagniSort human B cell enrichment kit (Thermo Fisher Scientific). Purified B cells were cultured for 5 days to differentiate plasmablasts. Briefly, 100,000 B cells were cultured per well in RPMI 1640 culture media supplemented with 10% heat-inactivated fetal calf serum, 10 mM Hepes (pH 7.4), 1× nonessential amino acids, 1 mM sodium pyruvate, 2 mM l-glutamine, penicillin (100 U/ml), streptomycin (100 U/ml) (all from Thermo Fisher Scientific), human interleukin-21 (50 ng/ml; PeproTech), and recombinant human CD40L (200 ng/ml) cross-linked with anti-hemagglutinin peptide antibody (50 ng/ml; both R& D systems). Iron bioavailability was modulated by the addition of apo-transferrin for 5 days (40 μg/ml; Thermo Fisher Scientific), or 500 nM ciclopirox olamine was added on day 1 of culture at a concentration that did not affect B cell viability in vitro (fig. S13). After 5 days of culture at 37°C, culture supernatants were collected, and secreted IgG was measured by enzyme-linked immunosorbent assay (ELISA) using a rat anti-human IgG Fc capture antibody (M1310G05, BioLegend), a mouse anti-human IgG-biotin detection antibody (HP6017, BioLegend), streptavidin–horseradish peroxidase (GE Healthcare), and trimethylboron substrate set (BioLegend). IgG concentration was determined by using a standard curve of human IgG (Sigma). Proliferation and plasmablast differentiation were measured by flow cytometry by staining with panel 3 (table S3), with cell number determined using counting beads (123count eBeads, Thermo Fisher Scientific). Primary data are included in data file S3.

Sequencing and bioinformatics

PBMCs from Tanzanian children at B0 and B32 were stored in RNAlater after thaw. RNA from samples with high viability (>90%) and sufficient cell number (>3 × 106 cells) were isolated using an RNeasy Plus Mini kit (Qiagen). Libraries were generated from samples with high RNA quality (RNA Integrity Number > 7) using TruSeq Stranded mRNA Library prep (Illumina) by Eurofins (n = 30). Libraries were sequenced to an average depth of 24 million 1 × 50–base pair single-end reads per sample (range, 17 to 34 million) on a HiSeq 2500 (Illumina, v4 chemistry). Raw data were aligned using GRCh38 assembly, reads within coding sequences were quantitated, and differential expressed (DE) genes were determined using the DESeq2 package embedded within the Seqmonk software ( Significantly differentially expressed genes had a false discovery rate FDR–adjusted P value of <0.01 from DESeq2 and an FDR-adjusted P value of <0.05 from the intensity-difference test. Heatmaps of DE genes were generated using Pheatmap package in R (69), with hierarchical clustering by Euclidean distance. BTM analysis used gene sets previously reported (16) and gene set enrichment tool within the Seqmonk software. BTMs were deemed significant if they had an enrichment z-score of less than −0.5 or greater than 1.5 and an adjusted P value of <0.05. Enriched BMTs were graphed using the giraph software integrated within Seqmonk. Microarray data from Kazmin et al. (13) (GSE89292) were analyzed using the GEOquery (70) and limma (71) packages in R for the 14 RTS, S-vaccinated north American adults for which there were data available for days 0, 1, 6, 28, 56, 57, 62, and 77. Days 1, 6, and 28 were compared relative to day 0. Days 57, 62, and 77 were compared relative to day 56. Probe sets were deemed significantly differentially expressed if they had a twofold change in expression and an FDR-adjusted P value of less than 0.01.

Multi-omics factor analysis

MOFA is a factor analysis model that enables integration of diverse datasets in a completely unsupervised fashion and was implemented in R with Python dependencies as previously described (17). Flow cytometry data were preprocessed as follows: Cell subset frequencies were converted to z-scores by subtracting the mean and dividing by the SD determined using full African dataset (n = 718 samples) for the 30 samples that also had transcriptomic data. RNA-seq reads were counted with Rsubread and vsn normalized using DESeq2 package. Genes were ranked according to variance across all 30 samples, and the 5000 most variable genes were used because this produced the most robust MOFA model. Samples with missing data for either flow subsets or gene expression were removed from the analysis resulting in 27 samples. Flow and RNA-seq data matrices were used to the train model using default settings (number of factors, 25; drop factor threshold, 2% of variance), which was then run 10 times independently, and the best trained model was selected using the EMBO output value. All models converged on seven latent factors, and factor loadings for cell types and genes were determined using MOFA’s TopWeights function, with BTMs analyzed using the gene set enrichment algorithm implemented in the MOFA package (P < 0.05), with gene sets as previously published (16).

Serological data and covariates

Serological data were available for a subset of the total participants from each trial site of the phase 3 trial. Anti-CSP and anti-HBV.S antibodies were measured by standardized ELISAs and antigens in a single laboratory (72). Vaccine response was determined by subtracting baseline antibody titer (in arbitrary units) from B3. Additional variables such as participant’s height, weight, and hemoglobin concentration were collected as previously described (12) and available for a subset of participants from each trial site.

Immunological trajectory and pseudotime analysis

The immune trajectory was assembled as previously described (21). The frequencies for the 19 cell subsets were first normalized for abundance; the mean and SD of frequency values for the 10th to 90th percentiles of each cell type for the entire Dutch cohort (1801 samples). These values were used to normalize cellular frequencies for both cohorts (frequency-mean10–90/SD10–90). With both cohorts in a combined dataset, principal components analysis was performed. Cell types for which the absolute correlation to PC1 was greater than 0.5 were included (n = 18 cell types). The diffusion maps algorithm was then applied to the scaled frequencies using the destiny R package (20, 73), and the resulting diffusion pseudotime values were scaled to a range of 0 to 1.

Statistical analysis with categorical variables

All statistical tests for cell type frequencies assumed nonparametric data; two-group comparisons were made using either two-tailed Mann-Whitney tests or Wilcoxon tests for paired data from the same individual at different time points. FDR (or Benjamini-Hochberg) adjustment was applied to all multiple testing. tSNE-defined data were used to determine changes to the frequency of individual clusters, with between-group comparisons analyzed with the Mann-Whitney test and paired samples from individuals analyzed by Wilcoxon signed-rank test. tSNE-defined cluster frequencies were also used for MDS approaches in which Euclidean distances were calculated from Spearman correlation matrices of all 336 tSNE clusters per sample, and MDS was performed using monoMDS from the vegan package (74). Mann-Whitney tests were used to compare MDS coordinates between groups. Heatmaps of manual gated cell subsets that were altered by age group or site were generated using Pheatmap package (69), with hierarchical clustering by Euclidean distances.

Statistical analysis of cell types using linear modeling

Linear regression was used to determine the contribution of vaccine group (Fig. 1), age group (Fig. 2), country (Fig. 6), or anemia (Fig. 7) as categorical variables to cell type frequencies by fitting the following models for each cell subset “i”; MVaccine <- lm(data[,i] ~ vaccine group + patient ID + sex); MAge_group <- lm(data[,i] ~ age group + sex); MCountry <- lm(data[,i] ~ country + age + sex); MAnemia <- lm(data[,i] ~ anemia + age + sex). The relative importance for each covariate to the cell subset variance was calculated using the relaimpo R package (75). LMER analysis was used to model immune cell subset dynamics over time with the lme4 package using maximum likelihood (76). These models included random intercepts and enabled the incorporation of longitudinal follow-up data from individual children. Frequencies of each immune parameter were modeled using age (in weeks) as a continuous variable and sex and location (Tanzania/The Netherlands) as categorical variables and included participant ID to account for repeated measures over time. In Tanzanian samples aged 20 to 220 weeks (n = 414, Fig. 4E), for each cell subset “i” (n = 20), the effect of age was tested by analysis of variance (ANOVA) for the following models; MNull <- lmer(data[,i] ~ (1| patient ID)); MSex <- lmer(data[,i] ~ sex +(1| patient ID)); MAge <- lmer(data[,i] ~ age + sex +(1| patient ID). ANOVA P values were FDR adjusted, and conditional and marginal R2 for MAge were calculated using the MuMIn package (77). For comparison of Dutch and Tanzanian samples, data were combined into a single dataset. The effect of sex and location (as a fixed effect and as an interaction term with age) was tested for each cell subset “i” (n = 19) using ANOVA of following models; MNull <-lmer(data[,i] ~ age + (1| patient ID)); MSex <-lmer(data[,i] ~ age + sex +(1| patient ID)); MLocation <-lmer(data[,i] ~ age + location +sex +(1| patient ID)); MAge*Location <-lmer(data[,i] ~ age*location +sex +(1| patient ID)). Model intercepts and slopes were used to predict cell subset frequencies for 20 to 220 weeks, and data were normalized to week 20 values and graphed using the Pheatmap package. All statistical analyses were performed with R software version 3.4.1. Normally distributed residuals were confirmed for all linear and mixed-effect models for all cell types.


Fig. S1. Pregating of cell types for tSNE.

Fig. S2. Gating hierarchy panel 1.

Fig. S3. Gating hierarchy panel 2.

Fig. S4. Representative flow plots panel 1.

Fig. S5. Representative flow plots panel 2.

Fig. S6. No vaccine-induced changes are detectable 1 month after vaccination.

Fig. S7. Strong correlations observed within related cell types.

Fig. S8. Cell subset frequency correlations between baseline and B3 time points.

Fig. S9. Cell subset frequency correlations between baseline and B32 time points.

Fig. S10. Anti-CSP and anti-HBV.S IgG titers before and after vaccination in infants and children from Mozambique.

Fig. S11. Distribution of participant age along the diffusion-pseudotime trajectory.

Fig. S12. Anti-CSP and anti-HBV.S IgG titers before and after vaccination in children from Tanzania and Mozambique.

Fig. S13. Dose-dependent toxicity of ciclopirox olamine on B cells in culture.

Fig. S14. Highly reproducible flow cytometry data between experiments.

Table S1. Participant characteristics.

Table S2. Consolidated Standards of Reporting Trials table.

Table S3. Antibody panels.

Table S4. Manually gated cell subset definitions panel 1.

Table S5. Manually gated cell subset definitions panel 2.

Data file S1. Differentially expressed genes between B0 and B32.

Data file S2. Differentially expressed genes between anemic and nonanemic children.

Data file S3. Plasmablast differentiation assay raw data.


Acknowledgments: We wish to thank all the RTS,S phase 3 vaccine trial participants and their guardians, the MAL055 clinical team, and the members of the MAL067 Vaccine Immunology Workgroup. We thank GlaxoSmithKline Biologicals S.A. for support in the conduct of the MAL067 study. We are grateful to the field and laboratory personnel from CISM, ISGlobal, and IHI-BRTC, in particular, L. Puyol, I. Cuamba, N. A. Williams, N. Díez-Padrisa, H. Sanz, and A. Ayestaran for logistics, database, and management support. We thank A. Segonds-Pichon and E. Vigorito for advice on statistical methods. We thank the staff of the Babraham Institute Flow Cytometry Facility for technical assistance. The Generation R Study is conducted by the Erasmus MC, University Medical Center in close collaboration with the Erasmus University Rotterdam and the city of Rotterdam. We gratefully acknowledge the contribution of children and parents. Funding: This study was supported by the Biotechnology and Biological Sciences Research Council through Institute Strategic Programme Grant funding (BBS/E/B/000C0427 and BBS/E/B/000C0428) and Core Capability Grant to the Babraham Institute. D.L.H. is supported by a National Health and Medical Research Council Australia Early-Career Fellowship (APP1139911). Funding for the clinical trial was provided by PATH Malaria Vaccine Initiative (MVI), the National Institute of Allergy and Infectious Diseases (NIAID R01AI095789, HVTN UM1 AI06861, and HIPC P30 AI027757), the Instituto de Salud Carlos III (PS11/00423, PI14/01422, and PI17/02044), and the Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR, 2014SGR991). C. Dobaño was recipient of a Ramon y Cajal Contract from the Ministerio de Economía y Competitividad (RYC-2008-02631). G.M. was recipient of a Sara Borrell – ISCIII fellowship (CD010/00156) and had the support of the Department of Health, Catalan Government (SLT006/17/00109). CISM is supported by the Government of Mozambique and the Spanish Agency for International Development (AECID). ISGlobal is a member of the CERCA Programme, Generalitat de Catalunya. The general design of the Generation R Study is made possible by long-term financial support from the Erasmus MC, University Medical Center, Rotterdam; the Netherlands Organization for Health Research and Development (ZonMw); and the Ministry of Health, Welfare and Sport. Author contributions: Conceptualization: D.L.H., M.A.L., and C. Daubenberger. RTS,S cohort: C. Dobaño, C. Daubenberger, G.M., J.J.C., T.R., M.M., A.N., A.T., and C.J. Generation R cohort: H.A.M. and M.C.v.Z. Methodology: D.L.H. and M.A.L. Investigation: D.L.H., E.J.C., and S.I. Computational and bioinformatic analysis: D.L.H., E.J.C., and T.R. Writing–original draft: D.L.H. and M.A.L. Writing–review and editing: D.L.H., M.A.L., E.J.C., C. Daubenberger, C. Dobaño, G.M., H.A.M., and M.C.v.Z. All authors read and approved the final version of the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data associated with this study are present in the paper or the Supplementary Materials. The RNA-seq data for this study have been deposited in the Gene Expression Omnibus (GEO) database (GSE134985). Flow cytometry data (African and U.K. samples) and all codes are available on GitHub: Flow cytometry data of the Generation R cohort are available upon collaboration request; contact: menno.vanzelm{at} and h.a.moll{at}

Stay Connected to Science Translational Medicine

Navigate This Article