Research ArticleGenomics

Temporal Dynamics of the Human Vaginal Microbiota

See allHide authors and affiliations

Science Translational Medicine  02 May 2012:
Vol. 4, Issue 132, pp. 132ra52
DOI: 10.1126/scitranslmed.3003605


Elucidating the factors that impinge on the stability of bacterial communities in the vagina may help in predicting the risk of diseases that affect women’s health. Here, we describe the temporal dynamics of the composition of vaginal bacterial communities in 32 reproductive-age women over a 16-week period. The analysis revealed the dynamics of five major classes of bacterial communities and showed that some communities change markedly over short time periods, whereas others are relatively stable. Modeling community stability using new quantitative measures indicates that deviation from stability correlates with time in the menstrual cycle, bacterial community composition, and sexual activity. The women studied are healthy; thus, it appears that neither variation in community composition per se nor higher levels of observed diversity (co-dominance) are necessarily indicative of dysbiosis.


Bacterial communities in the human vagina are key components of a multifaceted antimicrobial defense system that operates to protect women against disease (1). The mechanisms by which these bacterial communities effectively exclude nonindigenous organisms in reproductive-age women are not known with certitude, but one appears to be the production of acidic fermentation products (primarily lactic acid) that serve to create an acidic environment that restricts the growth of most pathogens (2). Consistent with this, microbiologists have demonstrated that vaginal bacterial communities of healthy women are often dominated by species of lactic acid–producing bacteria. The bacterial communities that help to provide these ecosystem services are composed of markedly different species, but can be clustered into five main groups that are commonly found in reproductive-age women of different ethnic groups and ages (3, 4). However, little is known about the temporal dynamics of these sorts of communities and whether they differ in terms of resistance and resilience to internally or externally imposed disturbances such as menses, sexual behaviors, lubricants, and hygiene practices. A better understanding of factors that lead to the development and maintenance of specific and stable vaginal bacterial communities is needed so that strategies can be developed to promote and maintain reproductive health.

From ecological theory, it can be anticipated that less stable communities may be more susceptible to invasion by pathogenic organisms (5). In the human vagina, an imbalance in the microbial community (dysbiosis) is thought to elicit the symptoms associated with bacterial vaginosis (6). Bacterial vaginosis is the most frequently cited cause of vaginal discharge and malodor and is the most common vaginal syndrome of reproductive-age women. It results in millions of health care visits annually in the United States alone (7) and is associated with increased risk for acquiring sexually transmitted infections (8) and developing obstetric complications (9). The condition is characterized by the occurrence of an excessive thin white vaginal discharge, a fishy unpleasant vaginal odor, an elevated vaginal pH (>4.5), and the presence of “clue” cells (squamous epithelial cells covered with adherent bacteria). In clinical settings, bacterial vaginosis is diagnosed when three of these four criteria are met (10). However, in research and laboratory settings, bacterial vaginosis is diagnosed by scoring a Gram-stained vaginal smear, often using the criteria defined by Nugent et al. (11). The Nugent score is based on the premise that high numbers of Lactobacillus spp. are indicative of health, and their depletion coupled with increased proportions of morphotypes resembling Gardnerella, Bacteroides spp., and curved Gram-variable rods is indicative of bacterial vaginosis. Although the causes of bacterial vaginosis remain an enigma (12), and there is contentious debate as to how it should be diagnosed, the condition is clearly a risk factor for adverse sequelae that affect women’s health.

Our understanding of vaginal bacterial community composition has broadened as a result of investigators using cultivation-independent methods based on the analysis of 16S ribosomal RNA (rRNA) gene sequences (3, 13, 14). These studies have shown that vaginal bacterial communities vary among women, are species-rich, and include bacteria that had not been detected by traditional culture-based methods. However, to date, most studies of vaginal microbiology have used cross-sectional designs in which samples are obtained from individuals at a single time point, or the interval between sampling spans weeks or months. Although these studies have provided important information on the species composition of vaginal communities, they yield little insight about the normal temporal dynamics of these bacterial communities within individuals and do not allow for the estimation of community stability. As a result, it is not known whether vaginal bacterial communities are relatively stable over time, can exist in alternative equilibrium states, oscillate in accordance with hormonal cycles, or exhibit more complex dynamics, all of which could affect the risk of disease. Moreover, the influences of personal behaviors, hygiene practices, and host characteristics (that is, immune and genetic factors) on community dynamics remain largely unknown.

Here, we describe changes in the identity and abundance of bacteria in the vaginal microbial communities of 32 women by analyzing vaginal samples obtained twice weekly over a 16-week period. The kinds of bacteria present in the samples were identified by classifying 16S rRNA gene sequences in each sample using high-throughput pyrosequencing and by characterizing vaginal community function by determining the metabolites produced throughout the 16-week period.


Characterization of vaginal bacterial communities

To determine the temporal dynamics of vaginal bacterial communities in healthy reproductive-age women, we conducted a longitudinal study in which 32 women (table S1) self-collected mid-vaginal swabs twice weekly for 16 weeks using a validated self-collection protocol (15). The bacterial diversity in the vaginal communities sampled was determined by pyrosequencing variable regions 1 and 2 (V1-V2) of bacterial rRNA genes. A data set of 2,522,080 high-quality classifiable 16S rRNA gene sequences was obtained from 937 samples with an average of 2692 ± 910 (SD) sequences per sample. By adapting methods we have described previously (3) (see Materials and Methods and the Supplementary Materials), the bacterial communities sampled were classified into one of five community state types on the basis of the differences in species composition and their relative abundances (fig. S1 and table S2). Three of these clusters, which were designated community state types I to III, were dominated by Lactobacillus crispatus, Lactobacillus gasseri, or Lactobacillus iners, respectively. Unlike our previous study (3), none of the communities examined here were dominated by Lactobacillus jensenii, probably because of the low prevalence of this community state type and the sampling of too few women. Communities that clustered in community state type IV-A and IV-B were heterogeneous in composition and lacked significant numbers of Lactobacillus sp., but differed in species composition (fig. S1 and table S2). Communities of state type IV-A were generally characterized by modest proportions of either L. crispatus, L. iners, or other Lactobacillus spp., along with low proportions of various species of strictly anaerobic bacteria such as Anaerococcus, Corynebacterium, Finegoldia, or Streptococcus. In contrast, communities of state type IV-B had higher proportions of the genus Atopobium, in addition to Prevotella, Parvimonas, Sneathia, Gardnerella, Mobiluncus, or Peptoniphilus and several other taxa. Some of the taxa in communities of state type IV-B have previously been shown to be associated with bacterial vaginosis (16). Overall, high Nugent scores were most often associated with community state type IV-B, whereas low Nugent scores were associated with community state type IV-A (Fig. 1E and fig. S2).

Fig. 1

Dynamics of vaginal community state types in 32 women over 16 weeks. (A) Heatmap showing the proportions of community state types (I, II, III, IV-A, and IV-B) observed within a woman over time [color key is indicated below (C)] that were used to generate the dendogram that depicts distances between proportions of the five community state types identified. (B) Color bar indicating community classes DA, LC, LG, DB, and LI as defined by clusters of proportions of community state types within a woman over time. (C) Profiles of community state types for 32 women over 16 weeks. Each dot (white or black) represents one sample in the time series. The absence of a dot indicates missing samples. The community state types in the time series for each woman are color-coded according to the schema shown below (C) (see also fig. S1). (D) Box plot of Jensen-Shannon distances between all pairs of community states within each subject. Community deviation from constancy is represented by the Jensen-Shannon index (black bar, see the Supplementary Materials), with higher values reflecting decreased constancy. (E) Box plot of average Nugent scores for each woman over 16 weeks. (D and E) The whiskers represent the lowest and highest datum still within 1.5 interquartile range of the lower and upper quartile. The middle 50% of the data are represented by the height of the box. Subject IDs are indicated on the right.

We further assessed the association of subject ethnicity with vaginal bacterial community composition and compared the results to the findings of our previous study (3). That study showed that black women are more likely to harbor communities of state type IV, whereas white women have a higher prevalence of community state type I. A log-linear model was used to assess similarity between proportions of community state types in black and white women in both studies. We showed that the association between community state types and ethnicity of the cohort of this study was similar to that previously reported (see the Supplementary Materials and fig. S3).

Community classes and community state type transitions

Profiles of community state types were derived from a time series of community samples and clustered into five classes of longitudinal dynamics, which we refer to as community classes (Fig. 1, A and B), designated LC, LG, LI, DA, and DB (see Materials and Methods). The term community class is further defined in the Supplementary Materials. These classes reflect similarities in how community composition changed over time (Fig. 1). Nearly all the temporal profiles were complex and somewhat individualized (Fig. 1C). Not all kinds of transitions between community states were equally likely to occur within a time series, and some were not observed (Table 1 and fig. S4). For example, vaginal bacterial communities of state type IV-B often transitioned to state type III, but only rarely to state type I. Likewise, the results show that vaginal communities of state type I, which are dominated by L. crispatus, most often transition to become community state type III, which are dominated by L. iners, or community state type IV-A. Notably, communities of state type III were two times more likely to switch to community state type IV-B than to community state type IV-A. The communities of state type II are dominated by L. gasseri and rarely underwent transitions to other community state types. Conversely, we observed no transitions from community state type I to community state type II, and there were only two transitions from community state type III to community state type II.

Table 1

Community state type transitions. Total number of transitions and frequencies of transitions from each community state type (see also fig. S4).

View this table:

Index of stability of vaginal bacterial communities

By definition, communities that persist in the same state type over time display a high level of stability, whereas those that often transition to different state types have low levels of stability. We introduce the normalized Jensen-Shannon divergence index, defined as the median of Jensen-Shannon distances between all pairs of community states, which provides a quantitative measure of community stability (see the Supplementary Materials and Fig. 1D). As the index becomes smaller, less change is observed between different community states and the community is more constant over time. The vaginal bacterial communities of subjects 3, 4, 5, and 6 were of class DB and displayed higher levels of overall constancy and persistently high Nugent score even though the subjects did not report any bacterial vaginosis symptoms. This calls into question current widely held views on what is considered to be a “normal” vaginal bacterial community that are founded on the premise that the communities of healthy women must contain high proportions of Lactobacillus species (17). This premise is also undermined by the observation that vaginal bacterial communities of subjects 32, 26, and 25 of class DA and subjects 30, 24, and 31 of class LC had high Jensen-Shannon indices yet persistently low Nugent scores (Fig. 1, D and E). Thus, highly variable vaginal bacterial communities do not always have persistently high Nugent scores, so variation per se does not always equate with disease. These results suggest that neither variation in community composition nor constantly high levels of apparent diversity (co-dominance) are necessarily indicative of dysbiosis. The meaning and implications of these observations will remain unknown until molecular characterization of vaginal microbiota is applied to studies of the events leading to adverse outcomes, such as acquisition of sexually transmitted diseases or preterm birth.

Lactobacillus sp. and vaginal bacterial community stability

The stability of vaginal communities was associated with certain species of lactic acid–producing bacteria that dominated a community. For example, the vaginal communities of women in community classes LC (five women) and LG (two women) were most often dominated by L. crispatus and L. gasseri, respectively. These communities appeared stable, exhibited fewer transitions between community states, and typically had low Nugent scores. In these women, most transitions between community states were associated with menses (figs. S1 and S5). For example, in subject 28 (fig. S5), a community dominated by L. crispatus was resilient, being replaced by a community dominated by L. iners during menses, which then reverted to a community dominated by L. crispatus at the end of menses. Similarly, in subject 24 (Fig. 2A), a community dominated by L. crispatus is replaced by a community dominated by Streptococcus during menses. Communities of class LI (14 women, fig. S5) were typically dominated by L. iners, but varied widely in terms of species composition and stability. This is illustrated by the communities of subjects 14, 15, 16, 18, and 19 that appeared to be comparatively stable over time, whereas others such as subjects 11 and 27 commonly shifted to different community types that were more often associated with higher Nugent scores. The underlying reasons for these differences are unknown but might reflect genomic heterogeneity in the dominating Lactobacillus sp. Heatmaps depicting the dynamics of vaginal communities in a few selected subjects are shown in Fig. 2; those for all 32 subjects are shown in fig. S5.

Fig. 2

(A to D) Heatmaps (top) and interpolated bar plots (bottom) of phylotype relative abundance observed in four selected subjects over 16 weeks (heatmap color key is indicated in the lower right corner). Color codes for each phylotype represented in the interpolated bar plots are shown below the figure. See fig. S5 for heatmaps and interpolated bar plots for all subjects. Red dots below the interpolated bar graphs represent menstruation days.

Dynamics of the vaginal microbiota

The rapid and sometimes extensive turnover of human vaginal bacterial communities was visualized by mapping temporal changes in community composition onto the three-dimensional (3D) community space defined previously in a cross-sectional study by Ravel et al. of vaginal communities in 396 women (3) (Fig. 3). These illustrations show that although the vaginal community composition of many individuals changed over time, the alternatives were often restricted to specific community states (movie S1 and Fig. 3D). This is consistent with the notion that vaginal communities can exist in alternative equilibrium states as we have previously hypothesized (3). In contrast, communities of most other women evinced dynamics consistent with the “community resilience hypothesis,” in which a community, though dynamic, normally resides in a single region of space (movie S2 and Fig. 3B). In these cases, the composition and structure of a community occasionally changed to a transition state, but they were resilient, and homeostatic mechanisms returned them to their “ground state.”

Fig. 3

Temporal dynamics of vaginal bacterial communities in two women over 16 weeks. (A and C) Interpolated bar graph of phylotype relative abundance for subjects 13 (A) and 26 (C). Profiles of community state types in which Nugent scores have been superimposed (high Nugent score, 7 to 10, large open circles; intermediate Nugent score, 4 to 6, medium open circles; and low Nugent score, 1 to 3, small open circles) are shown below the interpolated bar graphs. Daily metadata are represented by the following: red balls, menstruation; black open square, douching; open red triangle, vaginal intercourse; open black diamond, oral sex; vertical black bar, digital penetration; light blue closed circle, lubricant use. (B and D) Representation of vaginal community dynamics in 3D community space (3) for subjects 13 (B) and 26 (D). Communities dominated by species of Lactobacillus and representing community state types I (red), II (light blue), III (green), and V (purple) are shown at each of the four outer vertices of the tetrahedron, with community state type IV (yellow) at the inner vertex. The white line represents the succession of community states for each subject over time. Animations of these events are provided in the Supplementary Materials (movies S1 and S2).

In many women, the bacterial species composition and rank abundances of bacterial species changed markedly over short periods of time (fig. S5). Consistent with this, the communities of some women seemed resilient and showed simple and predictable changes between community states that occurred only during menses [for example, subjects 24 (Fig. 2A), 12 (Fig. 2D), 28, and 19]. In contrast, the communities of other women were almost invariant during menses (for example, subject 29; Fig. 2B), whereas others still changed states continuously over time regardless of whether the subjects were menstruating or not (for example, subjects 1, 2, 7, and 27). The latter might be examples of communities transitioning to alternative equilibrium states. Most of the transitions to other state types were, however, transient in nature, with 35% of all state types persisting for not more than a week. This is consistent with previous studies that showed significant yet short-lived changes in Nugent scores (18, 19).

Identifying factors that disturb stability

We sought to evaluate the dependence of vaginal bacterial community stability as a function of time in the menstrual cycle and other time-varying factors. The Jensen-Shannon divergence index described above is a measure of a community’s deviation from constancy over an entire time series, and hence, it cannot be used to study temporal changes in community stability. Instead, we calculated the rate of change of the log of Jensen-Shannon distances between consecutive community states. The resulting log Jensen-Shannon divergence rate of change over normalized menstrual cycles was modeled using a linear mixed-effects model with a Fourier polynomial of the normalized menstrual time component adjusted for hormonal contraception, community type, sexual activities, lubricant use, and douching (with the last three evaluated 1 day before sampling) and with subject-based degree 2 polynomial random effects to account for correlation in repeated measurements on each subject (Fig. 4). The Fourier polynomial captured population-level dependence of the log Jensen-Shannon divergence rate of change (that is, community deviation from constancy) on time in the menstrual cycle. The lowest constancy was associated with menses. The two minima of the population-level Jensen-Shannon divergence rate of change (indicating highest community constancy) coincided with the two maxima of estradiol concentration as reported by Minassian and Wu (20), whereas the maximum of progesterone concentration coincided with the second minimum of the population-level constancy function (Fig. 4). Of all the metadata evaluated in the model, sexual activity was the only one that had a significant (negative) effect on constancy independent of time in the menstrual cycle, but its effect was weak in comparison to that of community class (see Materials and Methods). No statistically significant dependence was observed between Shannon diversity index and time in the menstrual cycle (fig. S6), indicating that although bacterial communities tend to lack stability during menses, it is not necessarily accompanied by increased community diversity.

Fig. 4

Modeling the dependence of the log of Jensen-Shannon divergence rate of change over the menstrual cycle. The length of menstrual cycles within and between women was normalized to 28 days (Supplementary Materials), with days 1 to 5 corresponding to menses (red bar). The red line shows concentrations of estradiol as a function of menstrual time [data from (21)], and the blue line shows concentrations of progesterone [data from (21)]. The shaded areas around each curve show 95% point-wise confidence bands.

Dynamics of the vaginal metabolome

A total of 6 samples from subjects 28, 22, and 8, and 12 samples from subject 10 were processed for metabolic profiling by 1H NMR (nuclear magnetic resonance) spectroscopy (Fig. 5). These subjects were selected because they each belong to one of the major community classes observed in this study (Fig. 1). Strong resonance signals from the most abundant metabolites were identified in spectra of these samples, including lactate (δ = 1.33 ppm), acetate (δ = 1.92 ppm), and succinate (δ = 2.41 ppm), as well as certain amino acids and sugars (δ = 3.2 to 5.5 ppm) (Fig. 5). Subjects 28 and 22 maintained high levels of lactate even during menses when community composition shifted (Fig. 5, B and C). During week 11 and menses, subject 22 underwent a shift from a community dominated by L. gasseri to a community dominated by Streptococcus sp. without a corresponding decrease in lactate levels, suggesting functional redundancy between these two species and conservation of overall community function. However, Streptococcus sp.–dominated communities contained slightly higher levels of acetate (Fig. 5C). Similarly, during menses, the abundance of L. iners in the vaginal bacterial community of subject 28 increased, whereas that of L. crispatus decreased, without effect on the metabolome of the community (Fig. 5B). Over time, the vaginal bacterial community of subject 8 consistently included members of the genera Atopobium, Prevotella, and other anaerobes as well as lower abundance of L. iners. The metabolome consistently showed high levels of succinate, and acetate and lower concentrations of lactate over time compared to the metabolomes of the other subjects (Fig. 5D).

Fig. 5

Metabolomic analyses. (A to D) 1H NMR metabolome profiles for four representative subjects at selected time points throughout the 16-week study. An interpolated bar graph of phylotype relative abundance in each community is shown above the 1H NMR profiles. The colored rectangles on top of the interpolated bar graph match the color of the 1H NMR profiles and indicate the times of collection during the 16-week study. 1H NMR spectra are spaced according to the matching time scale on the y axis. Peaks for selected metabolites are labeled. Red arrow indicates lactic acid peaks, light blue arrow indicates succinate peak, and green arrow indicates acetic acid peak. Red dots below the interpolated bar graphs represent menstruation days.

The normalized 1H NMR spectra of vaginal swabs were analyzed by principal components analysis (PCA) to obtain an overview of the variations between samples within and between subjects (fig. S7). This analysis confirmed that the metabolic output of each community class was generally unique and somewhat consistent over time except for subject 10 (fig. S7A), which varied more than the others.

Major changes in community composition that lasted several weeks were temporally associated with changes in the metabolome. For example, the vaginal community of subject 10 underwent a profound change beginning in week 8 of the study in which the proportion of Atopobium sp. markedly increased, whereas that of L. iners decreased. This change was characterized by the increase of succinate and acetate and the decrease of lactate (Fig. 5A). The overall clustering was influenced by lactate and glycogen for subjects 22 and 28 and part of the time series in subject 10, whereas higher levels of succinate and acetate were responsible for the clustering of samples from subjects 8 and 10 (fig. S7B).


Here, we have shown that vaginal communities of some women show low constancy and high levels of species turnover, and that community dynamics vary widely among women even among those that cluster in the same community class. These findings suggest that point estimates of community composition that arise from cross-sectional studies could be misleading for most women because they experience a distribution of community states over time. This calls into question whether data from cross-sectional studies can be reliably used to link vaginal bacterial community composition to health outcomes. Fluctuation in community composition and constancy are mainly affected by time in the menstrual cycle, community class, and, to a certain extent, by sexual activity, but other unknown factors are also certainly at play. Nonetheless, it should be noted that in many cases, community function is probably maintained despite changes in community composition, because shifts in community structure often only involved changes in the relative dominance of a limited number of different lactic acid–producing bacterial species. Such fluctuations could occur while maintaining community performance (that is, lactic acid production; Fig. 5, A to D) when there is functional redundancy among community members, and shifts in the relative abundances of guild members occur due to changes in environmental conditions that favor one population over another. This, coupled with the observation that a limited number of community types are found in women of different ethnicities, suggests that deterministic processes driven by interspecies interactions, host-microbe interactions, and niche partitioning may account for vaginal community composition. In some cases, changes in community performance occur as a result of changes in community composition (Fig. 5A). These are reflected in the shift in composition of the metabolome, which in this strictly anaerobic environment is dominated by fermentation products such as lactic, succinic, and acetic acids that accumulate in vaginal secretions (Fig. 5, A to D).

Intervals of increased susceptibility to disease may occur because vaginal microbiota fluctuate. It is envisioned that better knowledge of vaginal community dynamics and the mutualism between the host and the indigenous bacterial communities will lead to the development of strategies to manage the vaginal ecosystem in a way that promotes health and minimizes the use of antibiotics. These efforts should take into account differences among individuals in terms of vaginal community dynamics and the potential problems that might arise from diagnostic and therapeutic strategies based on cross-sectional studies. The data reported in this study should help to inform and improve interpretation of past and future cross-sectional data sets.

Materials and Methods

Subjects and sample collection

Between 2006 and 2007, 39 nonpregnant, reproductive-age women were enrolled in a longitudinal study at Johns Hopkins University School of Medicine (Baltimore, MD) (18, 21). The protocol was approved by the Institutional Review Board of the Johns Hopkins University School of Medicine and the University of Maryland School of Medicine. Written informed consent was appropriately obtained from all participants that included permission to use the samples obtained in future studies.

Overall, 32 women successfully completed the longitudinal study (82% retention) (table S1). Self-collected mid-vaginal swabs and vaginal smears were obtained twice weekly for 16 weeks. All the swabs collected on a given day were stored in a freezer at the participant’s home and transferred on ice to the clinic on the participants’ scheduled visits at weeks 4 and 16 (or more frequently) where they were archived at −80°C.

Daily diaries were in the form of a yes/no check-off list to report menstrual bleeding; vaginal douching; sexual activity (including vaginal intercourse, receptive oral sex, digital penetration, anal sex, and use of sex toys, condoms, spermicides, and lubricants); wearing of thong undergarments; and use of medications or contraceptives, a diaphragm, sanitary napkin, and/or tampons. A comprehensive health and behavior questionnaire was administered at baseline and follow-up visits. Smears and behavioral diaries were submitted to the study site weekly.

Nugent Gram stain analysis

The vaginal smears were heat-fixed and Gram-stained, then blinded and read in random order. A microscopy score of 0 to 10 was assigned by an experienced microbiologist with the standardized method described by Nugent et al. (11). Nugent’s scores are composite scores based on the cellular morphologies of the bacteria present in a sample. A score of 0 to 3 was designated normal, 4 to 6 as intermediate, and 7 to 10 was considered to be abnormal and indicative of bacterial vaginosis.

DNA extraction and purification

Genomic DNA was extracted from archived vaginal swab specimens. Procedures for the extraction of genomic DNA from frozen vaginal swabs have been developed and validated (3, 15). Briefly, frozen vaginal swabs were immersed in 1 ml of prewarmed (55°C) cell lysis buffer composed of 0.05 M potassium phosphate buffer containing 50 μl of lysozyme (10 mg/ml), 6 μl of mutanolysin (25,000 U/ml; Sigma-Aldrich), and 3 μl of lysostaphin (400 U/ml in sodium acetate; Sigma-Aldrich), and the mixture was incubated for 1 hour at 37°C. Then, 10 μl of proteinase K (20 mg/ml), 100 μl of 10% SDS, and 20 μl of ribonuclease A (RNase A) (20 mg/ml) were added, and the mixture was incubated for 1 hour at 55°C. The samples were transferred to a FastPrep Lysing Matrix B tube (Bio101), and microbial cells were lysed by mechanical disruption with a bead beater (FastPrep instrument, Qbiogene) set at 6.0 m/s for 30 s. The lysate was processed with the ZR Fecal DNA extraction kit (Zymo Research) and according to the manufacturer’s protocol, omitting the lysis steps (steps 1 to 3). The kit includes a column (Zymo-Sin IV-HRC spin filter) specifically designed to remove polymerase chain reaction (PCR) inhibitors from the DNA samples. The DNA was eluted into 100 μl of TE buffer (pH 8.0). This procedure provided between 2.5 and 5 μg of high-quality whole-genomic DNA from vaginal swabs.

PCR amplification and sequencing of the V1-V2 region of bacterial 16S rRNA genes

We used two universal primers, 27F and 338R, for PCR amplification of the V1-V2 hypervariable regions of the 16S rRNA gene. The 338R primer includes a unique sequence tag to barcode the samples. A total of 96 unique 338R primers each with a specific barcode were synthesized. The primers were as follows: 27F, 5′-GCCTTGCCAGCCCGCTCAGTCAGAGTTTGATCCTGGCTCAG-3′; 338R, 5′-GCCTCCCTCGCGCCATCAGNNNNNNNNCATTACCGCGGCTGCTGGCA-3′, where the underlined sequences are the 454 Life Sciences primers B and A in 27F and 338R, respectively, and the bold font denotes the universal 16S rRNA primers 27F and 338R. The barcode within 338R is denoted by 8 Ns.

For every set of 192 vaginal genomic DNA samples, PCR amplification of 16S rRNA genes was performed in 96-well microtiter plates as follows: 5.0 μl of 10× PCR buffer II (Applied Biosystems), 3.0 μl of MgCl2 (25 mM; Applied Biosystems), 2.5 μl of Triton X-100 (1%), 2.0 μl of deoxyribonucleoside triphosphates (10 mM), 1.0 μl each of primers 27F and 338R (20 pmol/μl each), 0.5 μl of AmpliTaq DNA polymerase (5 U/μl; Applied Biosystems), and 50 ng of template DNA in a total reaction volume of 50 μl. Reactions were run in a PTC-100 thermal controller (MJ Research) with the following cycling parameters: 5 min denaturing at 95°C followed by 20 cycles of 30 s at 95°C (denaturing), 30 s at 56°C (annealing), and 90 s at 72°C (elongation), with a final extension at 72°C for 7 min. Separate plates containing negative controls without a template were included for each set of plate processed. The presence of amplicons was confirmed by gel electrophoresis on a 2% agarose gel and stained with SYBR Green (Ambion). PCR products were quantified with Quant-iT PicoGreen quantification system (Invitrogen), and equimolar amounts (100 ng) of the PCR amplicons were mixed in a single tube. Amplification primers and reaction buffer were removed by processing the amplicon mixture with the AMPure Kit (Agencourt). All PCR amplification reactions that failed were repeated twice with different amounts of template DNA, and if these failed, the samples were excluded from the analysis.

The purified amplicon mixtures were sequenced by 454 pyrosequencing with 454 Life Sciences primer A by the Genomics Resource Center at the Institute for Genome Sciences, University of Maryland School of Medicine, using Roche/454 Titanium chemistries and protocols recommended by the manufacturer and as amended by the Center.

Sequence reads quality control, analysis, and taxonomic assignments

The QIIME software package (22) was used for quality control of the sequence reads with the script and the following criteria: (i) minimum and maximum length of 220 and 400 base pairs (bp), (ii) an average of q25 over a sliding window of 50 bp (if the read quality dropped below q25, it was trimmed at the first base pair of the window and then reassessed for length criteria), (iii) a perfect match to a barcode sequence, and (iv) the presence of the 338R 16S primer sequence used for amplification. Sequences were binned on the basis of sample-specific barcode sequences and trimmed by removal of the barcode and primer sequences (forward if present and reverse). High-quality sequence reads were first de-replicated with 99% similarity using the UCLUST software package (23), and detection of potential chimeric sequences was performed with the UCHIME component of UCLUST (24). Chimeric sequences were removed before taxonomic assignments.

Taxonomic assignments were performed as described by Ravel et al. (3) with a combination of the RDP Naïve Bayesian Classifier (25) and speciateIT ( Taxonomic assignments (sequence read counts and relative abundances) are shown in table S2.

Statistical analysis

Community state type assignments. A community state type is a cluster of community states (the species composition and abundance of a vaginal community at a specific point in time) that are similar in terms of the kinds and relative abundances of observed phylotypes (fig. S1 and Supplementary Materials). The dissimilarity between community states was measured with Jensen-Shannon metric (Supplementary Materials). The clustering of community states was done with hierarchical clustering based on the Jensen-Shannon distances between all pairs of community states and Ward linkage. Only phylotypes with at least 20 sequences were used in the clustering process. The number of clusters was determined with the silhouette measure of the degree of confidence in a particular clustering. It was applied to the clusters of community states that resulted from the Jensen-Shannon hierarchical clustering by cutting the corresponding dendrogram at the level that would have k leaves (clusters), where k was between 2 and 10. The maximum of the silhouette values was at k = 5. The corresponding five clusters are depicted on fig. S1 and are labeled I, II, III, IV-A, and IV-B.

The silhouette values lie in the interval [−1, 1], with well-clustered observations having values near 1 and poorly clustered observations having values near −1. The precise definition of silhouette measure is as follows: for each community state type i, the silhouette width s(i) of i is set to be 0 if i is the only observation in the cluster, and if there is more than one element in the cluster of i, s(i) is defined by the formula:s(i)=[b(i)a(i)]/max[a(i),b(i)]where a(i) is the average Jensen-Shannon distance between i and all other community states of the cluster to which i belongs, and for each cluster C that does not contain i, d(i,C) is defined as the average Jensen-Shannon distance between i and all community states of C. b(i) is defined as minC d(i,C), and can be seen as the dissimilarity between i and the nearest cluster to which it does not belong. Community states with a s(i) value close to 1 are very well clustered, a value of s(i) close to 0 means that the community state lies between two clusters, and a negative value of s(i) means that the community state was probably placed in the wrong cluster. The silhouette value of clustering results of community states is the average of silhouette widths over all community states. The calculation of silhouette values was done with R package clValid version 2.0-2 modified to allow for use of Jensen-Shannon metric.

Community class assignments. The temporal patterns of vaginal bacterial community dynamics can be classified with profiles of community state types (see the Supplementary Materials). A profile of community state types is obtained by replacing each community state in the temporal data of phylotype proportions by the corresponding community state type (Fig. 1C). The proportion of time each state type is observed within the profile characterizes persistence of the community in a given community state type (Fig. 1C). Temporal patterns of vaginal bacterial communities are classified on the basis of the long-term patterns of community state type persistence. The dendrogram in Fig. 1 was generated with Ward linkage hierarchical clustering of the Euclidean distances between state type persistence vectors. A color bar in Fig. 1B delineates five clusters of community state type profiles, and these are referred to as community classes. The number of clusters was determined with the silhouette method described above in the context of community state type assignment.

Modeling the dependence of the average population-wide deviation from constancy of vaginal bacterial communities on the time in the menstrual cycle. We used a mixed-effects model to assess the association between the average population-wide deviation from constancy of vaginal bacterial communities and the time in menstrual cycle, hormonal contraception, community class, sexual activity, lubricant use, and douching. Sexual activity was considered a categorical variable indicating that one of the following activities took place 1 day before sampling: vaginal sex, anal sex, receptive oral sex, digital penetration, or use of a sex toy. Lubricant use and douching were also recorded a day before sampling. The outcome variable in the model is the rate of change of the log of Jensen-Shannon divergence defined as the ratio of the log of the Jensen-Shannon divergences between consecutive community states and the time between states. A structure of model (1) is captured by the following equations:Yi=s(t)+Xi β+Zi bi+ɛi,s(t)=a0+(ak sin(2πkt/T)+bk cos(2πkt/T)),k=1,,n;T=28,biN(0,D),ɛiN(0,Σi),i=1,2,,32(1)where Yi is a vector of log Jensen-Shannon divergence rates of change for the ith subject and s(t) is a degree n Fourier (trigonometric) polynomial capturing the nonlinear, mean pattern of association between Yi and the normalized menstruation time t. The normalized menstrual time was derived from the time variable by rescaling the intervals from the first day of one menstrual cycle to the first day of the next menstrual cycle to create 28-day cycles. If the study ended before the next menstrual cycle and the number of days from the first day of the last menses to the last day in the study was less than 28 days, no rescaling took place. Xi is a design matrix of fixed effects for the ith subject that in the full model incorporates the effect of hormonal contraception, community class, and the 1-day–lagged effects of sexual activity, lubricant use, and douching, and β is a vector of model parameters measuring the level of association with the aforementioned fixed effects. Zi is a random-effects design matrix of a polynomial function of sampling times for the ith subject, and bi is the associated vector of coefficients assumed random with a joint normal distribution with mean 0 and a variance-covariance matrix D. ɛi is a vector of residuals that are also assumed normally distributed with variance-covariance matrices, Σi, specific for each individual and described below. We assume bi and ɛi to be independent. Model fitting was performed with R package nlme version 3.1-100 (26).

Model selection was performed with the following top-down strategy (27):

• Find the optimal random-effects structure of a “beyond optimal” model whose fixed component contains all explanatory variables and as many interactions as possible using restricted maximum likelihood (REML) estimators (28).

• Find the optimal fixed-effects structure (using maximal likelihood estimators) for the model with the optimal random-effects structure.

• Present the final model using REML estimation.

The fixed effects of the beyond optimal model contain hormonal contraception, community class, and the 1-day–lagged effects of sexual activity, lubricant use, douching, and degree 4 Fourier polynomial of normalized menstrual time stratified by community class. The random structures tested with this model were subject-specific polynomials of sampling time of degrees 0, 1, 2, and 3. Both likelihood ratio test and the Akaike information criterion (AIC) (28) identified the polynomial of degree 2 as the optimal random structure. When comparing models with nested random-effects structures, the models are nested with respect to the variances. Thus, when performing a likelihood ratio test on such nested models, the null hypothesis is H0: σ2 = 0 and the alternative hypothesis is H1: σ2 > 0. Because the alternative hypothesis is σ2 > 0 and not σ2 ≠ 0, we are testing on the boundary. This is because if there is no evidence to reject the null hypothesis, then σ2 = 0 is the lowest possible value, because the variance is always non-negative. P values of the likelihood ratio tests for the random structures were computed with Verbeke and Molenberghs’ formulas for the likelihood ratio test while testing on the boundary (29).

The optimal fixed-effects structure was identified by a top-down selection of the categorical variables: hormonal contraception, community class, sexual activity, lubricant use, and douching (using likelihood ration tests) followed by a selection of the stratification of the Fourier polynomial term (using AIC criteria because these models were not nested within each other) and finally selection of the degree of the Fourier polynomial term. We have tested stratifications of the Fourier polynomial term corresponding to all possible subsets of the set {DA, DB, LI, LC, LG} of community classes, including the empty set (no stratification). The optimal fixed-effects structure contains a degree 3 Fourier polynomial, with no stratification, adjusted for sexual activity, and a binary community class factor with two levels, one that combines classes DA and DB, and the other that combines classes LI, LC, and LG (referred to as class L). The coefficients and P values of the model for the sexual activity and binary community coefficients are shown in table S3. Of all the metadata evaluated in the model, sexual activity was the only one with a significant (negative) effect on constancy, independent of time in the menstrual cycle; however, the magnitude of the effect is weak in comparison to that of binary community state type. Communities LI, LC, and LG are significantly more stable (constant) than the communities DA and DB.

The subject-specific normal quantile-quantile residual plots show that the residuals are nearly normally distributed. The within-subject plots of normalized residuals versus normalized menstrual time showed no discernible patterns.

We have also built models with the same structure as model (1), but where s(t) was set to be either a thin plate regression spline (30, 31) or a polynomial of normalized menstrual time. The mean population-level deviation from constancy curves of these models was not significantly different from the mean population-level deviation from constancy curves derived using Fourier polynomial model.

To assess the sensitivity of the results to the choice of dissimilarity measures between community states used, we applied the final model to the rates of change between consecutive community states using Bray-Curtis, relative entropy, Euclidean, and Manhattan dissimilarity measures. The resulting population-wide deviation from constancy curves lies within the 95% confidence interval ring of the Jensen-Shannon deviation from constancy curve, indicating the robustness of our results.

NMR sample preparation and data acquisition

For each sample, one dry Dacron swab head was cut with scissors and placed in a 1.5-ml centrifuge tube. The scissors were ethanol-sterilized and flamed before each use. About 0.6 ml of deuterated water (D2O) was added to the centrifuge tube as an extraction solvent. The sample was homogenized by vortex mixing for 1 min and stored on ice for 5 min. The solution was pipetted into a clean 1.5-ml tube and centrifuged (3 min, 13,000 rpm, 4°C) to remove particulates. An unused swab was also processed with the same procedure and served as a control.

1H NMR was used to assess the metabolite profiles. A total of 500 μl of the swab extract containing 50 mM phosphate buffer (pH 7.0) and 30 μM sodium 3-(trimethylsilyl) propionate-2,2,3,3-d4 as internal chemical shift reference was processed after vortexing, centrifuged at 13,000 rpm for 2 min. and transferred to a 5-mm NMR tube. All 1H NMR experiments were carried out at 25°C on a Varian AS500 spectrometer operating at a proton NMR frequency of 499.75 MHz. 1D spectra were recorded with a standard Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence. An 80-ms CPMG pulse train was used to eliminate signals from large molecules, such as proteins from blood serum. Each spectrum consisted of 128 transients with a spectral width of 12 ppm and relaxation delay of 5.0 s. All free induction decays were Fourier-transformed with an exponential function equivalent to a 0.3-Hz line-broadening factor, and the spectra were zero-filled to 32K points. The resulting spectra were manually phased and baseline-corrected with ACD/Labs (version 10.0, Advanced Chemistry Development Inc.). For 1H NMR signal assignment purposes, 2D J-resolved spectroscopy (32) and total correlation spectroscopy (TOCSY) (33) NMR spectra were acquired for two selected samples. J-resolved spectra were collected with 128 scans per 32 increments with 5000-Hz spectral width in F2 and 36 Hz in F1. The TOCSY spectra were recorded with a data matrix of 2048 × 128 with spectral width of 5000 Hz in F2 and F1. Sixty-four scans were acquired, and a mixing time of 80 ms was used. All 2D NMR data were processed with the software package NMRPipe (34).

Metabolic composition and PCA

1H NMR spectra were reduced to 61 integrated regions of varied width corresponding to the region of δ 0.60 to 8.50 ppm with intelligent bucketing from ACD/Labs. The region from δ = 4.7 to 5.2 ppm was excluded from the analysis to avoid the phasing effects of the presaturation of the residual water signal. Peaks from ethanol at δ = 1.19 and 3.65 ppm (Fig. 5A) and glycerol at δ = 3.56, 3.65, and 3.78 ppm (Fig. 5C) were believed to be potentially introduced to the swabs during sample collection or sample handling. Thus, the regions of δ = 1.16 to 1.22 ppm and 3.50 to 3.80 ppm were also excluded from the analysis. Normalization of the integrals to the total sum of the spectrum was carried out on the data before PCA to allow for differences in signal to noise. PCA was performed with the SIMCA-P software (version 10.0, Umetrics). The data were pre-processed by Pareto scaling.

Supplementary Materials

Materials and Methods


Fig. S1. Assignment of vaginal community state types.

Fig. S2. Profiles of community state types, Nugent scores, and menses for 32 women over a 16-week period.

Fig. S3. Modeling the correlation between ethnicity and community state types.

Fig. S4. Graphical representation of community state type transitions observed between all consecutive pairs of time points (905 transitions, Table 1) and their frequencies among all women.

Fig. S5. Individual profiles of dynamics of vaginal bacterial communities over 16 weeks for each of 32 women enrolled in the study.

Fig. S6. Shannon diversity indices over the menstrual time in 32 women samples twice weekly for 16 weeks.

Fig. S7. Principal components analysis of metabolic profiles.

Table S1. Descriptive characteristics of the 32 women enrolled in the 16-week longitudinal study.

Table S2. Taxonomic assignments, relative abundances of taxa, and metadata.

Table S3. The coefficients and P values of binary community class and sexual activity for the optimal linear mixed-effects model.

Movie S1. Animation of the vaginal microbiota dynamics in subject 26 over a 16-week period.

Movie S2. Animation of the vaginal microbiota dynamics in subject 13 over a 16-week period.

References and Notes

  1. Acknowledgments: We thank R. Gicquelais, E. Neuendorf, and H. Yang for technical assistance with DNA extraction. Funding: This research was supported by the NIH, National Institute for Allergy and Infectious Diseases awards UH2-AI083264 (J.R. and L.J.F.), U01-AI70921 (J.R.), K01-AI080974 (R.M.B.), and R03-AI061131. Author contributions: R.M.B., J.R., and L.J.F. conceived the original idea and designed the study. J.S., S.S.K.K., and L.F. conducted most of the experiments. G.B. performed the metabolomic experiments and analyses. P.G. and Z.A. developed, implemented, and performed the statistical modeling analyses. U.M.E.S. and X. Zhong contributed to the statistical modeling analyses. Z.M. and X. Zhou contributed to the interpretation of the data. P.G., R.M.B., G.B., Z.A., L.J.F., and J.R. interpreted the data and wrote the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: The sequence data are available in the small read archive (SRA) (accession no. SRA026073). The metadata associated with the sequence data are available in dbGap (dbGap study no. phs000261).
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article