Research ArticleGenomics

Genomic Diversity and Fitness of E. coli Strains Recovered from the Intestinal and Urinary Tracts of Women with Recurrent Urinary Tract Infection

See allHide authors and affiliations

Science Translational Medicine  08 May 2013:
Vol. 5, Issue 184, pp. 184ra60
DOI: 10.1126/scitranslmed.3005497


Urinary tract infections (UTIs) are common in women, and recurrence is a major clinical problem. Most UTIs are caused by uropathogenic Escherichia coli (UPEC). UPEC are generally thought to migrate from the gut to the bladder to cause UTI. UPEC form specialized intracellular bacterial communities in the bladder urothelium as part of a pathogenic mechanism to establish a foothold during acute stages of infection. Evolutionarily, such a specific adaptation to the bladder environment would be predicted to result in decreased fitness in other habitats, such as the gut. To examine this prediction, we characterized 45 E. coli strains isolated from the feces and urine of four otherwise healthy women with recurrent UTI. Multilocus sequence typing and whole genome sequencing revealed that two patients maintained a clonal population in both these body habitats throughout their recurrent UTIs, whereas the other two exhibited a wholesale shift in the dominant UPEC strain colonizing both sites. In vivo competition studies in mouse models, using isolates taken from one of the patients with a wholesale population shift, revealed that the strain that dominated her last UTI episode had increased fitness in both the gut and the bladder relative to the strain that dominated in preceding episodes. Increased fitness correlated with differences in the strains’ gene repertoires and carbohydrate and amino acid utilization profiles. Thus, UPEC appear capable of persisting in both the gut and urinary tract without a fitness trade-off, emphasizing the need to widen our consideration of potential reservoirs for strains causing recurrent UTI.


More than half of all women develop at least one episode of urinary tract infection (UTI) during their lifetimes. Up to 25% of women have recurrent UTI, which is defined as two or more episodes within a 6-month period (1). Most community-acquired UTIs are caused by uropathogenic Escherichia coli (UPEC) (2). A generally accepted model for infection is that UPEC migrate from the gastrointestinal tract to the periurethral area, and eventually up the urethra into the bladder (3).

The gut and urinary tract are very distinct habitats from the perspective of their metabolic, immunologic, and microbial features. The gut is home to our largest population of microbes (46), whereas the bladder is considered a normally sterile environment, guarded by physical and biological barriers to microbial invasion (79). Studies of the molecular pathogenesis of UTI in a mouse model (1012) have identified numerous virulence factors, including adhesins, toxins, iron acquisition systems, capsular structures, flagellae, pathogenicity islands (PAIs), and components important for biofilm formation (13). Among adhesins, UPEC strains typically encode a multitude of chaperone/usher pathway (CUP) pilus gene clusters. CUP pili contain adhesins at their tips that play critical roles in host-pathogen interactions, recognizing specific receptors with stereochemical specificity (14). For example, FimH, the type 1 pilus tip adhesin, binds mannosylated glycoproteins, as well as N-linked oligosaccharides of β1 and α3 integrins that are expressed on the luminal surface of the bladder epithelium (urothelium) in humans and mice (15, 16). Type 1 pilus–mediated binding can lead to invasion of UPEC into mouse and human bladder epithelial cells (1719). Invading UPEC can be expelled from the host cell (20), or they can “escape” into the cell’s cytoplasm where they replicate rapidly and form a biofilm-like structure, composed of 104 to 105 organisms, known as an intracellular bacterial community (IBC) (21, 22). Bacteria in the IBC are protected from antibiotics (23, 24) and from immune responses (11, 25). IBCs are transient; after maturation, UPEC can disperse from the IBC, exit their host cells, enter the lumen of the bladder, and subsequently invade other urothelial cells (21). One primary host defense that eliminates IBCs is exfoliation, where urothelial cells undergo an apoptotic-like cell death, detach from the underlying transitional epithelium, and are eliminated in the urine (25, 26). Exfoliated bladder epithelial cells containing IBCs have been observed in urine collected from women with recurrent UTI but not in healthy controls or in cases of UTI caused by Gram-positive pathogens (26). However, exfoliation exposes underlying cell layers of the urothelium. Subsequent UPEC invasion of these underlying cells in mice results in formation of additional intracellular structures termed quiescent intracellular reservoirs (QIRs) (27, 28). Bacteria in the QIR are dormant, are resistant to antibiotic treatment, and elude recognition by host immune defenses. Mouse models have been used to demonstrate that bacteria in QIRs can contribute to recurrent infection after antibiotic treatment has rendered the urine sterile (23, 28).

Consistent with these adaptations for colonizing the bladder habitat, UPEC have been classified as a subset of extraintestinal pathogenic E. coli strains (ExPECs). ExPECs are distinguished from other gut-associated mutualistic and pathogenic E. coli strains based on the spectrum of diseases they cause and their genomic features. ExPECs are typically members of the B2 and D subtypes of E. coli and often carry PAIs encoding virulence-associated genes (29). Mutations in fimH are postulated to be an important component in the evolution of UPEC strains, with secondary contributions from mutations affecting other loci in gut-associated E. coli populations (3032). In this conceptualization, mutations conferring a fitness advantage within the urinary tract are selected in this body habitat (33). The ability of multiple UPEC strains to form specialized intracellular structures such as the IBC (34) and QIR suggests a very specific adaptation to the bladder environment. Increased fitness in the urinary tract has been hypothesized to confer decreased fitness in the gut habitat of origin so that strains successfully colonizing the urinary tract encounter a “dead-end” evolutionary path; this has been cited as an example of “source-sink” evolutionary dynamics (31, 35).

Here, we have used comparative genomics, in vitro assays of growth in the presence of a broad range of potential nutrients, and in vivo fitness tests of representative E. coli strains obtained from both the urine and feces of women with recurrent UTIs to address several questions. How dynamic are the E. coli populations in the gut and urinary tract of a given individual sampled over a period when recurrent UTIs are experienced? Can genome-wide comparisons of gut and urinary tract isolates provide insights into whether recurrences arise from reinoculation of gut-derived bacterial strains into the urinary tract or from intracellular reservoirs within the bladder? Are there fitness trade-offs due to adaptation to either the gut or urinary tract environment? We observed two very different patterns: recurrent UTI caused repeatedly by the same strain, or rapid and apparently complete replacement of one strain with another in both body habitats between UTI episodes (arguing against a fitness trade-off model). We were able to correlate replacement of one strain by another with their genomic and metabolic features. The significance of these results is discussed in the context of understanding disease pathogenesis and designing clinical translational studies focused on new approaches for pathogen surveillance and treatment.


E. coli strains cluster by host instead of habitat

One hundred fourteen women were enrolled in a now completed study of recurrent UTI (36); each presented with symptoms of acute cystitis. Eight of the 114 individuals had a negative enrollment urine culture, whereas 2 were lost to follow-up after the initial visit. Of the 104 remaining participants, 9 had three episodes of recurrent UTI (the greatest number of episodes among enrollees). However, fecal and urine samples were collected at each episode in only four of these nine individuals. All four patients received a similar cycle of antimicrobial therapy: trimethoprim-sulfamethoxazole (TMP-SMZ) for their enrollment UTI, nitrofurantoin for their second UTI, and ciprofloxacin for their third UTI [see (36) and table S1 for further details about clinical characteristics and treatment; note that patient 56 was initially treated with TMP-SMZ during episode 2 but then switched to nitrofurantoin when her urine isolates were found to be TMP-SMZ–resistant]. Sequencing near full-length amplicons generated from the 16S ribosomal RNA (rRNA) genes in the 45 strains recovered from fecal and urine samples collected during the three UTI episodes experienced by each of these four individuals confirmed that all were E. coli. These 45 E. coli strains (table S2), which included 1 urine isolate and, on average, 3 fecal isolates from each of the four patients at the time of each episode of UTI, were selected for the current study to address our questions about source-sink dynamics (where do strains arise and how do they distribute themselves among different body habitats) and about the relative fitness of E. coli strains from a patient or patients that experienced a wholesale population shift between episodes.

To investigate the relatedness of the 45 E. coli strains and the relationships between strain characteristics and the body habitat from which they had been recovered, we first conducted multilocus sequence typing (MLST) using seven well-conserved housekeeping genes: adk (adenylate kinase), fumC (fumerase isozyme C), gyrB (DNA gyrase subunit B), icd (isocitrate hydrogenase), mdh (malate dehydrogenase), purA (adenylosuccinate synthetase), and recA (recombinase A) (see table S3 for primers used for MLST). Two patterns emerged from the MLST analysis: (i) a “stable clonal” pattern, where isolates from the same patient were nearly indistinguishable at all time points surveyed (patients 12 and 13), and (ii) a “dynamic” pattern, where isolates from the same patient included several different MLST groups during the study (patients 56 and 72). Even among the two patients with the dynamic pattern, strains isolated at a given clinic visit tended to have identical or very similar MLST profiles, regardless of the body site from which they had been recovered (Fig. 1). In the case of patient 72, sequence typing revealed two MLST groups among her fecal and urine isolates during episode 1. During episode 2, all her fecal and urine isolates had a single MLST group assignment that was the same as one of the groups in episode 1. In episode 3, fecal and urine isolates had a single MLST group, but it differed from all the MLST groups of strains recovered during episodes 1 and 2, leading us to conclude that she harbored nearly clonal E. coli populations in both body habitats, but that the population had changed between episodes 2 and 3. In contrast, only one fecal isolate, from the second UTI episode from patient 13, differed in its MLST group from the group assigned to all her other fecal and urine isolates recovered at the time of UTI episodes 1 to 3, leading us to conclude from the MLST analysis that she had a largely clonal population across episodes.

Fig. 1 MLST analysis of patient isolates.

An eBURST diagram constructed on the basis of publicly available E. coli MLST sequence types. MLST types from patient isolates from the present study are represented by colored circles; the color denotes the patient, whereas the size (diameter) of the circle indicates the proportion of strains that share that same MLST type or an MLST type that differs by only one allele.

Our initial assessment of clonality was based on MLST sequencing of a limited number of strains. Although no additional strains were available to extend sampling depth to search for minor E. coli populations, we proceeded to collect more data on the available strains to overcome potential limitations of MLST in assessing strain relatedness (37). We chose to examine the 3 urine and 11 fecal strains isolated from patient 13 (stable “clonal pattern”) and the 3 urine and 9 fecal strains recovered from patient 72 (“dynamic pattern”) as two contrasting individual examples of colonization patterns. These isolates represent each of the three episodes of UTI experienced by each of these two women (table S4). We sequenced polymerase chain reaction (PCR) amplicons generated from the fimH gene. A maximum likelihood fimH gene tree is presented in fig. S1, with the position of fimH alleles in the various urine and fecal isolates recovered from patients 13 and 72 shown. All three of the urine isolates from patient 13, representing each of her three UTI episodes, had fimH sequences that were identical to the fimH sequences in all 11 of her fecal isolates. For patient 72, this sequence identity was true for most of the isolates from the first two UTI episodes but not for the third UTI, where the urine strain had a fimH sequence that was different from the strains representing the two previous UTIs. However, the fimH sequence of this UTI episode 3–associated urine isolate was identical to the fimH sequences in the contemporaneously collected four fecal isolates (fig. S1). Compared to patient 72’s first and second episode isolates, there were seven amino acid substitutions in the FimH allele of fecal and urine isolates from her third UTI episode: A10V, N70S, S78N, V119A, T234A, A273G, and N6T. The latter residue is in the signal sequence that is not part of the mature FimH protein assembled on the tips of pili. Positions 10, 70, 78, and 119 are in the lectin domain of the protein, whereas residues occupying positions 234 and 273 are in the pilin domain; with the exception of position 119, all are solvent-exposed in a crystal structure of the FimC-FimH complex (15). None are in the mannose-binding pocket. However, residue 273 is positioned near the hydrophobic groove where donor strand complementation and donor strand exchange occur—processes essential for pilus biogenesis (14).

Unsupervised hierarchical clustering of isolates based on their gene and single-nucleotide polymorphism content

To obtain more definitive data on strain relatedness and clonality, we performed whole-genome shotgun sequencing of all 6 urine and all 20 fecal strains recovered from patients 13 and 72 (60 to 140× coverage; see Materials and Methods). Using the Velvet (38) and AMOScmp (39) assemblers, 25 of 26 of the isolates yielded genome assemblies that averaged 4.98 million base pairs (Mbp) with an average N50 contig length of 72,747 bp (table S4).

To identify and compare the gene content of the isolates, we first compiled a database of all annotated genes from the genomes of 54 E. coli strains deposited in the National Center for Biotechnology Information (NCBI) RefSeq database that were classified as “complete,” as well as 324 other draft E. coli genomes present in RefSeq and the PATRIC database (table S5). These genes, together with the genes predicted by Glimmer3 (40) in the assembled genomes of the newly sequenced strains from this study, were clustered using the program CD-HIT with default parameters (95% similarity) (41) to generate “OGUs” (operational gene units). All raw reads for each of the 26 newly sequenced E. coli genomes were mapped to this E. coli pan-genome using BLAT with default parameters (42). The total number of raw reads mapping to a given OGU was then used as the score for that OGU. A null cutoff score was calculated by dividing the total number of reads by the total length of OGU representatives (as determined by CD-HIT); this cutoff represents the expected number of reads per OGU normalized by length if reads were randomly selected from all OGU representatives. OGUs with scores less than this cutoff were called “absent”; those above were called “present” (43). By mapping the raw reads from each isolate and the in silico fragmented sequences from the finished UTI89 genome onto this E. coli pan-genome data set, we identified a total of 11,151 OGUs that were present in at least 1 of the 26 clinical isolates from the two patients or in the finished genome of UTI89; 3488 of the 11,151 OGUs (31.3%) were conserved in all 26 strains.

We next identified a total of 295,099 single-nucleotide polymorphisms (SNPs) among the 25 isolates at positions present in the finished UTI89 genome (see Materials and Methods; one fecal sample Ec72_E2F2 was excluded from this analysis given the problems encountered assembling its genome; see table S4). The SNP and OGU data were then used to more rigorously examine clonality. First, on the basis of the SNP rate (SNPs/aligned base pairs), we computed a matrix of pairwise distance measurements between isolates from our clinical study of recurrent UTI and the 378 strains we used in the OGU analysis. Unsupervised hierarchical clustering based on these distance measurements (fig. S2A and Fig. 2) showed that the clinical isolates that we deemed clonally related to one another by MLST also clustered together in the SNP-based tree of 403 strains. With the exception of one fecal strain, the SNP rate between any pair of isolates from patient 13 was in the noise range (<100 total, SNP rates <0.005/bp) regardless of their date of isolation or whether they were recovered from urine or feces. In the case of patient 72, we identified distinct groups of strains among the urine and fecal isolates, representing distinct branches on the tree. Fecal and urine strains isolated from her first and second UTI episodes clustered separately from those of the third UTI episode. Second, the OGU-based, unsupervised hierarchical clustering of the 404 strains produced the same patterns for the 26 isolates as those obtained from SNP rates (fig. S2B and Fig. 3). Thus, on the basis of their SNP and OGU content, we considered UTI episodes 1 to 3 in patient 13 to have all been caused by the same UPEC strain (which in turn was very similar to the strain recovered from feces and urine during UTI episode 3 in patient 72).

Fig. 2 Heat map of SNP differences between E. coli strains.

SNP rates (SNPs/aligned base pairs) between different sequenced strains were interpreted as a distance matrix. Hierarchical clustering was done on this symmetric matrix of SNP rates. Color in the heat map represents SNP rate as shown in the legend at the top left. A tree based on Euclidean distance is shown to the left of the heat map. The colors of the strains indicate their patient of origin. UTI89_finished refers to the finished UTI89 genome (76), whereas UTI89_illumina is the resequenced and reassembled genome from a 36-nucleotide (nt) read data set generated for this study. The comparison between the UTI89_finished and UTI89_illumina identified the noise range of the sequencing and assembly pipeline we used.

Fig. 3 OGU-based clustering of sequenced E. coli genomes.

The presence or absence of OGUs in the newly sequenced UPEC genomes characterized in the present study and 54 other publicly available E. coli strains were used to construct an OGU matrix. Hierarchical clustering was performed on the matrix on the basis of Euclidean distance. There are two clades suggested by the clustering analysis. Phylogenetic group membership of the strains is indicated with bars at the right of the figure. All isolates from patient 13 are colored red, whereas all isolates from patient 72 are colored green. The two strains used for in vivo competition experiments and phenotype microarray analyses are highlighted.

Finally, we quantified the growth of each isolate from patient 72 under 190 different culture conditions using Biolog phenotype microarrays (see Materials and Methods). Phenotype-based hierarchical clustering of her urine and fecal isolates yielded results that were virtually the same as those obtained from the MLST-, OGU-, and SNP-based comparisons (see Fig. 4A and below for additional information about the relationship between growth properties and gene content).

Fig. 4 Growth phenotypes of isolates from patient 72.

(A) Unsupervised hierarchical clustering of growth phenotypes as defined by Biolog phenotype microarrays. Squares represent fecal samples, circles denote urine samples, and the numbers inside squares and circles indicate the episode number from which the strain was derived. (B) Comparison of the growth phenotypes of strains Ec72_E1U1 and Ec72_E3U1 from UTI episodes 1 and 3 on carbohydrate, nucleoside, and amino acid substrates. Carbohydrate substrates that result in greater than 10-fold growth of the episode 3 strain are shown. See fig. S10 for other carbohydrates examined. Utilization of the bracketed carbohydrates requires genes involved in galactose metabolism. The color key denotes growth relative to the reference UPEC strain UTI89.

Tests of relative fitness in representative urinary isolates from patient 72

Our SNP- and OGU-based hierarchical clustering segregated the 26 isolates from the current study and other isolates, including UPEC and ExPEC strains, into two major clades at the top level of the tree. This first tree division is consistent with previous phylogenetic characterizations of various other E. coli strains (for example, groups A, B1, and D in clade 1 and B2 in clade 2), and we refer to this top tree division using the terms clade 1 and clade 2 for convenience (Figs. 2 and 3). Comparison of the content of known virulence factors in the 26 sequenced isolates from our study and the 54 complete E. coli genomes from RefSeq revealed that strains located in clade 1 had significantly fewer virulence genes compared to strains in clade 2 (see table S6 for P values, χ2 tests).

We selected urine isolate Ec72_E1U1 obtained from patient 72 during her first UTI episode and located in clade 1 as a proxy for fecal and urine samples from her first two UTI episodes, and urine isolate Ec72_E3U1, which was recovered from her last UTI episode and located in clade 2 as a proxy for all urine and fecal isolates from this episode and a very close strain to all urinary and fecal isolates from all three UTI episodes in patient 13 (Figs. 2 and 3). Because her urine strain from episode 1 was replaced by the urine strain in episode 3, we asked whether the episode 3 strain had higher relative fitness compared to the episode 1 strain in both the bladder and gut. If this were true, it would provide a counterexample to the notion that there is a fitness trade-off between the urinary tract and gut, and would suggest that the urinary tract is not necessarily a “sink” or evolutionary dead-end habitat for UPEC strains.

To address this question, we first turned to a well-established mouse model of UTI in conventionally raised C3H/HeN mice (44). The UTI episode 1 strain was marked with low-copy plasmids containing genes conveying resistance to kanamycin (pACYC177; KanR) or chloramphenicol (pACYC184; ChlorR). The UTI episode 3 strain was marked with pACYC184 (ChlorR). The p15A origin driving replication and segregation of these plasmids confer stable inheritance in E. coli and many other Enterobacteriaceae (45). In vitro competition experiments revealed that these plasmids were indeed stable in both strains and that there was no growth (fitness) defect between the marked strains and their unmarked counterpart, as judged by quantifying colony-forming units (CFU) over the course of 8 hours of growth under shaking conditions or during 48 hours of growth under static conditions in LB medium. Furthermore, there was no significant difference in growth of the episode 1 strain (Ec72_E1U1) when it contained pACYC177 (KanR) versus pACYC184 (ChlorR).

We subsequently compared the fitness of Ec72_E1U1 with Ec72_E3U1 by co-inoculating a 1:1 mixture of the Ec72_E1U1/pACYC177 (KanR) and Ec72_E3U1/pACYC184 (ChlorR) strains into the bladders of female C3H/HeN mice (1 × 107 to 3 × 107 CFU of each strain; n = 5 mice). Mice were sacrificed 24 hours after inoculation, and the number of bladder CFU of each strain was defined by plating bladder homogenates on selective media. The marked episode 3 strain (Ec72_E3U1/pACYC184) was the only strain detectable in bladders 24 hours after inoculation, with the exception of one mouse where the episode 1 strain was present at a low level (103 CFU; Fig. 5A). In follow-up single strain infections using unmarked strains without any antibiotic resistance plasmids, the episode 1 strain was undetectable in the bladder tissue of mice 24 hours after inoculation, whereas the episode 3 strain achieved a median colonization density of 6.4 × 103 CFU per bladder (range, 4.4 × 102 to 1.12 × 108) (Fig. 5A). Confocal microscopy of bladder whole mounts, prepared 6, 12, 16, and 24 hours after mice were monoinfected with the same strains containing pANT4 [encodes green fluorescent protein; (46)], revealed that the episode 3, but not the episode 1, strain was able to form small IBCs (fig. S3), consistent with other reports that intracellular infection contributes to fitness during UTI. Additionally, we analyzed urine samples as well as bladder and kidney homogenates prepared from mice 24 hours after transurethral inoculation of 10 times more CFU (108) of one or the other of these strains (or the reference control UPEC strain UTI89). The results revealed barely detectable levels of the episode 1 strain, Ec72_E1U1, in bladder homogenates, although it was present in kidney and urine. In contrast, the episode 3 strain (and the control UTI89 isolate) was present in all three sample types and at significantly higher levels than the episode 1 strain [P < 0.05, two-way analysis of variance (ANOVA) and Mann-Whitney U test; fig. S4].

Fig. 5 In vivo fitness of urine isolates from patient 72’s UTI episodes 1 and 3.

(A) Urine isolates obtained from UTI episodes 1 (isolate Ec72_E1U1) and 3 (Ec72_E3U1) were introduced separately or together into the bladders of female C3H/HeN mice. Data points represent CFU per bladder in individual mice, and horizontal bars represent the median of CFU per bladder of that strain in the experiments. Data on the y axis are presented in log scale: therefore, all 0s were plotted as 1 for visualization (*P < 0.05, two-tailed Mann-Whitney test). (B) The plasmid pACYC184 (ChlorR) was lost during a 2-week colonization of the intestines of gnotobiotic mice or conferred a fitness disadvantage. CFU on LB agar plates without antibiotics represent all Ec72_E1U1 in the fecal samples. CFU on LB agar plates with kanamycin or chloramphenicol represent Ec72_E1U1 carrying the corresponding antibiotic resistance plasmid from the same fecal sample (data are means ± SEM). (C) FitSeq determination of the relative fitness of strains from patient 72’s UTI episodes 1 and 3 in the gut of gnotobiotic mice (data are means ± SEM).

To test the stability of these plasmids during a much longer period of colonization in the gut, we introduced a 1:1 mixture of the episode 1 strain marked with pACYC177 (KanR) or pACYC184 (ChlorR) into adult germ-free male C57BL/6J mice using a single oral gavage (n = 5 mice). Fecal samples were collected daily during the first 4 days after gavage, and then every 2 days for 2 weeks. Samples were plated on LB agar with and without antibiotics. Total fecal levels of E. coli ranged from 0.9 × 107 to 1.3 × 107 CFU/mg (wet weight) throughout the experiment (nonselective medium). However, levels of the ChlorR-marked strain fell throughout the experiment, whereas levels of the KanR-marked strain remained constant (Fig. 5B), indicating that in contrast to short-term (24-hour) colonization of the bladder, pACYC184 (ChlorR) conferred a fitness disadvantage, or that this plasmid was being lost from the episode 1 strain during the 2-week period of monitoring gut colonization.

To circumvent the problem of having to mark strains with plasmids, we developed a method we named FitSeq that differentiates sequenced strains based on their SNP content and provides a digital output of their abundance (see Materials and Methods). To validate FitSeq, we began with an in silico simulation using reads from the whole-genome sequencing data sets obtained from strains Ec72_E1U1 and Ec72_E3U1. Mixtures of reads were created with the fractional representation of Ec72_E1U1 set at 0.4, 0.5, and 0.6, and the observed ratios of the two strains were calculated on the basis of SNP content over a five order of magnitude range of input reads (10 to 1,000,000 per strain). Figure S5A demonstrates that 100,000 reads are more than sufficient to determine the ratio of the two strains. Next, the two strains were each grown in monoculture, genomic DNA was extracted, and the two purified DNAs were mixed in a manner such that the fractional representation of strain Ec72_E1U1 was systemically varied from 0 to 1 in 0.05 increments. An Illumina sequencer was used to generate 36-nt reads from these defined mixtures. Using 500,000 reads per sample, in silico simulations of the type described above and direct analysis of the defined mixtures showed excellent correlation between expected and detected representation (r2 = 0.999; fig. S5B).

With these results in hand, we gavaged germ-free, adult male and female C57BL/6J mice (n = 5 per group) with a 1:1 mixture of the two urine strains recovered from patient 72 during her UTI episodes 1 and 3. Fecal samples were collected as described above. FitSeq disclosed that the strain from episode 1 was rapidly outcompeted by the episode 3 strain in the guts of both male and female animals (Fig. 5C), similar to the temporal profile seen in the original human patient.

Mechanisms that could underlie the increased fitness of episode 3 strain from patient 72 in both the bladder and gut environments

Our assembly of the genomes of Ec72_E1U1 and Ec72_E3U1 indicated that they share 4714 OGUs and have 1432 and 1969 unique OGUs, respectively. To better understand the differences in fitness of the episode 1 and 3 strains, we undertook a more in-depth genomic and phenotypic analysis. We generated a more complete assembly of their genomes after resequencing [150 nt × 2 (paired-end) Illumina MiSeq reads; 39- to 42-fold coverage of each genome; N50 contig length, 108,524 bp (Ec72_E1U1) and 126,534 bp (Ec72_E3U1); table S4]. There was a high correlation between gene coverage with the initial short read assembly and gene coverage with the longer MiSeq reads (r2 = 0.99). In addition, BLAST searches of the new genome assemblies confirmed the presence or absence of OGUs (as defined from the earlier analysis) at both the nucleotide and predicted protein levels.

The episode 3 strain Ec72_E3U1 contains the complete fim operon encoding type 1 pili. Although the episode 1 strain Ec72_E1U1 has a full fimH gene, it is missing the other structural genes required for assembling a functional type 1 pilus. Indeed, under laboratory growth conditions, we were unable to induce expression of functional type 1 pili in Ec72_E1U1 as measured by hemagglutination of guinea pig red blood cells. Consequently, this strain was unable to form type 1 pilus–dependent biofilms after growth in LB broth in polyvinylchloride wells. The episode 1 strain was also deficient in forming pellicle biofilms during growth in YESCA (yeast extract/casamino acids) broth [note that pellicle biofilm formation is not dependent on type 1 pili; (47)]. In contrast, the episode 3 strain was similar to the prototypic human cystitis isolate UTI89 in assays for type 1 pilus expression and function (fig. S6, A and B), but unlike UTI89, it was not capable of pellicle biofilm formation (table S7).

The episode 1 strain (Ec72_E1U1) was significantly depleted in genes involved in flagellar assembly function (P < 0.05, χ2 test). Ten core flagellar assembly genes are present in the episode 3 strain and absent in the episode 1 strain: They include genes essential for formation of the MS ring (fliF), the C ring (fliG, fliM, and fliN), and the export apparatus (fliH, fliI, fliO, fliQ, fliP, and fliR) (fig. S7). The lack of these essential components of the basal body would severely impact the ability of the flagellum to rotate, thus affecting motility. Indeed, the episode 1 strain was nonmotile, whereas the episode 3 strain and UTI89 were motile as measured in swimming and swarming assays (fig. S6C and table S7). The other four strains recovered from the urine and feces of patient 72 that clustered together with Ec72_E1U1 in the MLST-, OGU-, and SNP-based trees (Ec72_E1F1, Ec72_E2U1, Ec72_E2F1, and Ec72_E2F2 in Figs. 1 to 3 and fig. S2) also lacked these 10 core flagellar assembly genes (table S8A).

The episode 3 strain had most of the canonical UPEC virulence–associated PAI elements (PAI-II, PAI-III, and PAI-IV), eight chaperone-usher pilus systems, plus several additional toxins and iron acquisition systems (α-hemolysin and four major siderophore systems). In contrast, the episode 1 strain was missing most of these PAIs, had only one intact chaperone-usher pilus system, and lacked all of the siderophore systems and toxins associated with UPEC (table S6 and fig. S8). In vitro assays confirmed the absence of α-hemolysin activity in this strain (see Ec72_E1U1 in table S7).

The episode 3 strain was also enriched in phosphotransferase systems (PTSs) relative to the episode 1 strain (P < 0.05, χ2 test). The genes encoding PTS-Sor-EIIA, PTS-Sor-EIIB, PTS-Sor-EIIC, and PTS-Sor-EIID, which comprise l-sorbose–specific enzyme II (EII) in the phosphenolpyruvate (PEP)–dependent PTS (fig. S9), were absent in the less-fit episode 1 strain and present in the episode 3 strain. Three components are required in the PTS: the two common PTS proteins, enzyme I (EI) and HPr, which transfer a phosphoryl group from PEP to the substrate-specific EII complex. EII is involved in the first step of sorbose utilization, transport of l-sorbose into the cell, and phosphorylation to l-sorbose-1-phosphate. The absence of this particular sorbose PTS EII suggested that the episode 1 strain cannot use l-sorbose, a fact confirmed in the phenotypic microarray assay (Fig. 4B). l-Sorbose derived from dietary vegetables exists in both the human gut and the urinary tract (48). l-Sorbose utilization is a distinctive feature of virulent E. coli, including enterotoxigenic E. coli (ETEC), enteroinvasive E. coli (EIEC), Shiga toxin–producing E. coli (STEC), enteropathogenic E. coli (EPEC), and other UPEC strains (49). The lack of the l-sorbose PTS was also observed in (i) the genomes of the four other strains judged to be clonal with Ec72_E1U1 based on MLST-, OGU-, and SNP-based analyses (Ec72_E1F1, Ec72_E2U1, Ec72_E2F1, and Ec72_E2F2) and (ii) the 13 reference strains in phylogenetic group A that clustered with this clonal population in the tree shown in Figs. 2 and 3 (table S8B).

The differential representation of other genes in the genomes of these two strains suggests a genetic basis for the observed differences in their in vitro growth phenotypes (Fig. 4B) and fitness in the gut and bladder (Fig. 5, A and C). For example, genes involved in galactose utilization were more prominent in the episode 3 strain and correlated with its higher growth rates on substrates requiring galactose metabolism (Fig. 4B). Defects in galactose utilization are known to affect colonization of the intestine. E. coli uses multiple sugars for growth in the intestine, and multiple mutations affecting different sugar utilization pathways have an additive effect on the colonization levels of the enterohemorrhagic E. coli strain EDL933 in CD-1 mice (50). Peptides or amino acids are the primary carbon source for E. coli during UTI (51), and peptide transport, gluconeogenesis, and the tricarboxylic acid (TCA) cycle are required for UTI caused by the UPEC strain CFT073 (48, 51). The phenotype microarray analysis disclosed that the more-fit episode 3 strain had higher growth rates on all four dipeptides and 10 of 22 amino acids tested (Fig. 4B). Nine of these 10 amino acids are glucogenic in the TCA cycle. Genes involved in gluconeogenesis and TCA cycle were enriched in the variable component of the episode 3 isolate’s genome compared to the episode 1 isolate’s genome (table S9).


We have conducted a study analyzing the genomic features of E. coli strains isolated from the feces and urine of four women during recurrent bouts of UTI and assessed the relative fitness of representative strains in mouse models. We found two very different colonization patterns, each represented by two patients, with respect to the dominant E. coli population in the gut and bladder. One pattern, exemplified by patient 13, can have stable and seemingly clonal E. coli populations in the gut and urinary tract for several months over the course of multiple recurrent UTIs. In contrast, the other pattern, illustrated by patient 72, can be very dynamic with a wholesale shift in the major population colonizing both the intestinal and urinary tract (occurred over the 1-month period between her second and third episodes of UTI). Unsupervised hierarchical clustering of both genetic and phenotypic data (MLST, whole-genome gene and SNP content, and in vitro growth on 190 substrates) supported the clonal relationship of strains representing these dominant E. coli populations.

We tested the in vivo fitness of two isolates: (i) Ec72_E1U1, a representative of the strains present in patient 72’s gut and bladder during her first two episodes, and (ii) Ec72_E3U1, a representative of the gut and urine isolates from the last UTI episode of patient 72. The results revealed that the latter strain had higher fitness in both the mouse bladder and the gut, consistent with the population shift documented in both the gut and the bladder of patient 72. These fitness differences correlate with a number of genomic and metabolic features that provide insights about the requirements for survival in these body habitats. Nonpathogenic E. coli strains generally contain fewer chaperone-usher pilus systems than do pathogenic strains (52). We found that the less competitive episode 1 strain carries fewer pilus systems (only 1 of the 13 known CUP systems in E. coli) than does the episode 3 strain. Moreover, there were seven predicted amino acid differences between the FimH proteins encoded by the two isolates. FimH residues 70 and 78 define two major groups of FimH sequences (53, 54); the first UTI episode isolate (Ec72_E1U1) has 70N and 78S, which are associated with fecal and non-UPEC strains, whereas the third UTI isolate (Ec72_E3U1) has 70S and 78N, which are associated with UPEC strains.

Flagella are thought to be important for UTI pathogenesis. The flagellum consists of a basal body, hook, and filament. Flagellar synthesis is a highly ordered and regulated process involving three classes of genes. Class I genes include flhDC, which encode the FlhD/FlhC complex that functions as a transcriptional activator of flagellar class II operons. Class II genes encode the basal body and hook, as well as FliA and FlgM, which are the σ factor and anti–σ factor that regulate transcription of class III genes. Class III genes encode the hook-associated proteins and the filament of the flagellum (FliC), as well as proteins necessary for motility (for example, MotA and MotB) (55). Studies of isogenic wild-type and ΔfliC strains have shown that loss of the flagellar protein FliC results in reduced persistence in the urinary tract of mice, whereas IBC formation and dispersal are not affected (56). E. coli strains are classically grouped by serotyping on the basis of their lipopolysaccharide O antigens and flagellar H antigens. Nearly all E. coli have an H-typeable flagellar antigen, including nonmotile strains. H-typeable but nonmotile isolates include sorbitol-fermenting O157 strains isolated from hemolytic uremic syndrome in which there is a 12-bp deletion in flhC (57). The urine isolate and one fecal isolate from the first two UTI episodes in patient 72, which would be typed as an H30 strain based on sequence identity to fliC from strain HW32 (58), have deletions that eliminate a subset of flagellar class II genes. All these strains, which include the episode 1 strain, Ec72_E1U1, have an intact flhDC operon. Thus, deletion of genes encoding flagellar structural protein represents another (alternative) route to disruption of E. coli motility.

No common set of virulence determinants has been identified that is specific to UPEC strains and absent from E. coli strains that have a mutualistic relationship with their host (59, 60). The general lack of known virulence determinants in the episode 1 strain from patient 72 raises the question of how this strain was able to cause a symptomatic UTI. The episode 1 strain Ec72_E1U1 and the closely related episode 2 strain Ec72_E2U1 were isolated from the urine of patient 72. They were cultured from midstream urine samples using previously well-defined and validated protocols (36) that make fecal contamination highly unlikely. In a recent study, 80 and 40% of urine isolates collected from women with symptomatic UTI were capable of expressing functional type 1 and P pili, respectively, after in vitro growth (59). In addition, studies of isolates from women with asymptomatic bacteriuria have found that they are enriched for UPEC strains that have lost the ability to make functional pili (61). Thus, alternate mechanisms for E. coli colonization of the urinary tract exist. The association of UTI symptoms with the episode 1 strain is even more puzzling given its limited number of chaperone-usher pili, its fimH sequence, its lack of PAIs, and its clustering with other nonpathogenic strains based on comparison of their sequenced genomes. Sexual intercourse has been shown to introduce bacteria into the female bladder (3). Patient 72 engaged in sexual intercourse just before bacteriuria and the development of symptoms (36). Therefore, it seems reasonable to propose that Ec72_E1U1, despite lacking functional P and type 1 pili, was inoculated into her bladder in sufficient quantities to maintain itself for a period to cause symptoms. Indeed, we found that an inoculation of 108 CFU of Ec72_E1U1 into the bladders of mice was sufficient to maintain bacteriuria and kidney colonization even in the absence of significant invasion of the bladder urothelium (fig. S4). Furthermore, studies in mouse models have shown that certain UPEC strains that are defective in IBC formation can be complemented to form IBCs when there is co-infection with other UPEC (34); thus, in the context of a mixed infection, this strain may be able to persist within the bladder. In contrast, Ec72_E3U1 colonized and invaded the bladder. Our finding that the episode 3 strain Ec72_E3U1 has intact coding sequences for functional type 1 pili and flagella and a greater flexibility in utilizing carbon sources available in the gut and urinary tract emphasizes the multigenic underpinnings of virulence, provides mechanistic understanding for its observed displacement of the less-fit Ec72_E1U1 between patient 72’s UTI episodes 1 and 3, and underscores the need to examine the role of metabolic capabilities in determining the fitness of UPEC in two body habitats involved in disease pathogenesis. For example, we have shown that mutants with deletions in sdhB (required for conversion of succinate to fumarate in the TCA cycle) or mdh (catalyzes metabolism of malate to oxaloacetate; loss of mdh blocks the TCA cycle and glyoxylate shunt) are both attenuated in a mouse model of UTI, correlating with a decreased ability to form IBCs (62).

The observation of the same strain of E. coli in both urine and fecal isolates during a given UTI episode and across successive UTI episodes is somewhat surprising given the hypothesis that the urinary tract is an evolutionary dead end from which E. coli do not emerge to seed other habitats (31). Two possibilities that are not mutually exclusive could explain our results: (i) UPEC are very fit in both the gut and the urinary tract, and UTI results from gut to bladder inoculation, but the reverse never happens; (ii) UPEC transiently occupy, and in fact dominate, the gut E. coli population en route to the urinary tract, and because we have sampled strains only during UTI episodes, we observe the same strain in both the feces and urine. These two possibilities differ in that (i) assumes high UPEC fitness in the gut, whereas (ii) does not. Given our gut colonization data in gnotobiotic mice, it seems that (i) is the more likely of these two possibilities. If the urinary tract is not an evolutionary dead end, then a third possibility needs to be considered: The urinary tract may be a source for gut colonization with UPEC. The third possibility is consistent with a scenario where there is dynamic fluxing between both body habitats so that a strain originating from the gut causes a UTI episode and UPEC from the bladder/urine subsequently reseed the gut; this “cycle” could lead to the “homogeneity” that we see in our study, where the one strain dominates in both habitats during a given UTI.

Overall, this is a complex issue because gut colonization by UPEC can be consistent with the urinary tract as a dead end; that is, if UTI always arises from infection with gut bacteria, then human-human transmission and epidemics of UTI could be caused merely by fecal-oral transmission of bacteria from gut to gut without requiring intervening occupancy of the urinary tract. Two observations argue against this notion. First, in the case of patient 72, we find that the relative fitness of gut and urinary isolates in mice follows their dynamics in the human host and that isolates with higher relative fitness coexist simultaneously in both host habitats. Second, in an analysis of heterosexual couples, colonization of the gut of both partners by the same strain of E. coli was associated with cunnilingus (63). One explanation for this latter result involves transmission from the urinary tract of the female to the gut of the male. Food-borne transmission of extraintestinal E. coli may also represent a possible route of dissemination (64, 65).

The concept in evolutionary theory that specialization to one environment is generally detrimental to fitness in another has been extensively explored in the context of microbial survival in the presence and absence of antibiotics. Fitness costs due to adaptation to antibiotic pressure are seen. However, compensatory mutations that restore fitness and no-cost adaptive mutations have been identified in numerous systems (66). In Campylobacter jejuni, fitness trade-offs are very clear in the development of resistance to macrolide antibiotics, yet evolution of resistance to fluoroquinolones confers equal or higher overall fitness in the absence of antibiotic pressure in animal models of infection, demonstrating that fitness landscapes can be dynamic and complex (67, 68). Therefore, whereas some UPEC strains may suffer in their gut fitness, our data from patient 72 indicate that a pathway to high fitness in both urinary tract and gut exists. This is interesting in light of the highly specialized intracellular infection pathway used by many UPEC in the bladder, the efficiency of which varies from strain to strain (21, 22, 34). IBC formation is multifactorial, and there is evidence that this is a selected process among UPEC strains (33). Type 1 pili mediate binding to uroplakins (16, 25) but play an additional intracellular role in IBC formation (33, 69) and thus are key features of both UTI and IBC formation. Selection for this specialization toward uroepithelial cells is expected to decrease fitness in other habitats such as the gut, especially in a source-sink model where the urinary tract is an evolutionary dead end. We have now demonstrated that fitness in the urinary tract and gut is not necessarily inversely related (for example, the episode 3 but not the episode 1 strain from patient 72 forms IBCs). It may be useful to consider how selection for the biofilm-like IBC in uroepithelial cells could enhance survival in the gastrointestinal tract, which is lined with a mucus layer containing polysaccharides that can serve as a nutrient repository as well as a place for attachment and establishment of syntrophic relationships with other members of the microbiota. That is, the IBC may be a focal point for pathoadaptive changes that also increase fitness of strains in the gut.

Our findings provide a rationale for additional studies of populations of women representing different ages, genotypes, and life-styles (including different diets and nutritional status) to address the question of the origins of recurrent UTI and the relative importance of dynamic and stable patterns of colonization. This is especially important because only 4 of 104 patients met our inclusion criteria (three recurrent UTIs with E. coli isolates available from both feces and urine sampled at the same time during each episode), and because in-depth genomic and phenotypic characterization of isolates was performed for just two of these patients. If UPEC are able to move freely between the gut and the urinary tract, this complicates our inference of the ultimate reservoir for recurrent UTI. Therefore, further studies of the type described in this report are needed to determine the migration directions of these bacteria between different sites within an individual and between individuals. We envision these studies as a part of a translational medicine pipeline directed at developing more informed concepts about the pathogenesis of recurrent UTI, as well as more effective therapies. For example, whole-genome sequencing of isolates obtained from time series studies of patients with recurrent UTI should help determine whether most of the E. coli isolates obtained from the gut and urinary tract of an individual at a given time point are clonally related, and whether there are barriers to homogenization across different body habitats (oral, fecal/perianal, vaginal, periurethral, and bladder). FitSeq and the type of animal models used here can then be used to compare the fitness of UPEC strains that sweep to dominance in the gut and urinary tract of a human host in the setting of UTI. The results may influence standards of care in the future, for example, whether long-term surveillance of the fecal microbiota coupled with sampling microbial communities from other body habitats can identify population shifts in patients with histories of UTI before the onset of UTI. This surveillance may also help define new approaches to chemoprophylaxis, involving either existing antibiotics or next-generation compounds that target UPEC through novel mechanisms, such as mannosides that impede or block FimH-mediated binding of UPEC strains to mannosylated epithelial surface receptors (70).

Materials and Methods


The TOP trial was conducted with protocols approved by the Human Studies Committee of the University of Washington. Exclusion criteria included known anatomic or functional abnormalities of the urinary tract, symptoms or signs of acute pyelonephritis, chronic illness requiring medical supervision, pregnancy, or planned pregnancy in the 3-month period after enrollment. Clinical information and strains were collected during the trial as described by Czaja et al. (36).

Amplicon sequencing

Genomic DNA was prepared from each E. coli isolate with the Promega Wizard Genomic DNA kit according to the manufacturer’s instructions. Near full-length amplicons from the 16S rRNA gene were generated by PCR and sequenced with the dideoxy chain termination method as described (71). Gene targets for MLST, including the fimH allele, were amplified with the primers listed in table S3 in the following reaction mixture: 1× PCR buffer (Invitrogen) supplemented with 2.5 mM MgCl2, 1.4 M betaine, 1.3% dimethyl sulfoxide, 200 μM deoxynucleotide triphosphates, 5 ng of template, 12.5 pmol of each primer, and 1 U of Taq polymerase (Invitrogen) in a total volume of 50 μl. Reactions were heated to 95°C in a thermocycler for 5 min and then cycled 35 times with the following conditions: 95°C for 1 min, 55°C for 1 min, and 72°C for 1 to 3 min (depending on the expected product size). Reactions were finished with a 10-min incubation at 72°C, cooled to room temperature, and subsequently purified with the Qiagen QIAQuick PCR purification kit according to the manufacturer’s instructions. Amplicons were sequenced with standard dye-terminator capillary sequencing. Base calling and assembly of multiple reads were performed with the programs Phred, Phrap, and Consed with default parameters.

Biolog phenotype microarrays

Phenotype microarrays were run according to the manufacturer’s standard protocols. Briefly, a strain was grown on solid LB agar with no antibiotic selection at 37°C overnight. Cells were scraped from this plate, resuspended in 10 ml of buffer IF 0a GN/GP Base IF (Biolog), normalized to a transmittance of 85% with the Biolog turbidimeter, diluted 100-fold in buffer IF 0a GN/GP Base IF with Biolog Dye mix, and then pipetted into Biolog PM1 or PM2 plates. Plates were subsequently incubated at 37°C in the Biolog machine for 48 hours with colorimetric measurements made every 30 min. Data were exported from the Biolog software as total AUC (area under the curve) for the 48-hour assay, giving 96 AUC values for each PM plate. These values were subjected to unsupervised hierarchical clustering to determine the relatedness of the phenotype microarray profiles for each strain.

Whole-genome sequencing and assembly

Genomic DNA libraries for the Illumina GA-II sequencer were prepared according to the manufacturer’s protocol. Each strain was sequenced (36-nt reads) with two lanes of the eight-lane flow cell. The output files were converted to FASTA format, ignoring quality scores, and assembled with the Velvet short read assembler (version 0.7) with custom Perl scripts to optimize k-mer length and minimum coverage parameters for both N50 length and total assembly length. AMOScmp (version 2.0.5) was used to further improve the assembly with default parameters.

Paired-end libraries with 500-bp inserts were prepared for strain Ec72_E1U1 and Ec72_E3U1 as described by the manufacturer of the Illumina MiSeq instrument. Sample-specific, 8-nt Hamming barcodes (72) were incorporated into the sequencing adapter for multiplex sequencing [8 barcodes per strain; two strains sequenced in a single MiSeq flow cell; 16 barcodes per MiSeq flow cell to get a balanced base composition during the first four cycles of the sequencing run (critical for accurate cluster calling as well as good phase and prephasing values)]. The output FASTQ files were assigned to each strain with the barcodes (0.68 million reads for Ec72_E1U1 and 0.78 million reads for Ec72_E3U1) and then assembled with MIRA (version 3.4.0) using default parameters (73).

SNP analysis

Analysis of assembled genomes for SNPs was done as described (43). UTI89 was used as the reference finished UPEC genome. All genomes were aligned against UTI89 with BLASTN with default parameters. Only the alignment with highest P value reported by BLASTN was used for each assembled contig. The position of each SNP, based on UTI89 genome sequence, was recorded. SNP rates were calculated for each pair of assembled genomes by counting the number of sequence differences only at positions in the UTI89 reference genome where both genomes had contigs that aligned. The total number of sequence differences in overlapping regions of the genomes was divided by the total length of overlapping regions, yielding SNP rate per aligned base pair.

Competitive tests of fitness

All experiments involving mice were performed with protocols approved by the Washington University Animal Studies Committee. In vivo fitness tests in the urinary tract were performed as previously described (44). Briefly, bacteria were grown under type 1 pili–inducing conditions (two passages at 37°C in LB broth without shaking for 16 to 20 hours, with a 1:1000 dilution between passages). These static cultures were briefly agitated to resuspend settled bacteria. Cells were collected by centrifugation (~3000g for 10 min at 4°C) and resuspended in phosphate-buffered saline (PBS) to an OD600 (optical density at 600 nm) of 1.0. An equal mixture of two strains was inoculated transurethrally into the bladders of 7- to 8-week-old female C3H/HeN mice. Twenty-four hours later, mice were sacrificed and their bladders were removed aseptically, placed in 1 ml of PBS, mechanically homogenized with a stainless steel electric tissue homogenizer (PRO Scientific) for 15 to 20 s, and plated on LB/agar plates with and without antibiotics [kanamycin (50 μg/ml), chloramphenicol (100 μg/ml), or kanamycin plus chloramphenicol]. CFU counts were defined after a 12- to 18-hour incubation at 37°C under aerobic conditions. Single infections were performed similarly, except that bacterial suspensions were mixed 1:1 with PBS, to achieve a final OD600 of 0.5 before inoculation.


To assess their relative fitness in the gut, we grew E. coli strains Ec72_E1U1 and Ec72_E3U1 under type 1 pili–inducing conditions. The two strains were mixed in equal proportions (2 × 108 to 3 × 108 CFU/200 μl per strain) and inoculated by oral gavage into germ-free 8- to 10-week-old C57BL/6J mice. Mice were maintained in plastic flexible film gnotobiotic isolators and fed a standard autoclaved chow diet (B&K Universal; diet 7378000) ad libitum. Fecal samples were collected from each mouse 1, 2, 3, 4, 6, 8, 10, 12, and 14 days after gavage. Each fecal pellet was placed in 1 ml of PBS and homogenized by vortexing. A 10-μl aliquot of the homogenate was plated directly on LB agar (n = 4 plates per sample). Twenty four hours later, colonies were collected by scraping (1 ml of PBS). Genomic DNA was isolated with the phenol/chloroform method (71) and then fragmented by sonication to 300 to 500 bp (Bioruptor sonicator for ultrasonic liquid processing; 20 cycles of 30-s ON at high power/30-s OFF).

Illumina sequencing libraries were prepared as described by the manufacturer. Sample-specific, 8-nt Hamming barcodes (72) were incorporated into the sequencing adapter for multiplex sequencing (n = 96 samples per lane of an Illumina HiSeq 2000 flow cell; 80 to 100 million reads per lane; >500,000 42-nt reads per sample).

Raw reads were assigned to each sample with the barcodes and then mapped to the database containing the genomes of both strains with Eland (74). Only reads that could be uniquely assigned to one of the two genomes were used to score the relative representation of that strain in a given fecal sample. Counts for each strain were normalized by the informative genome size (IGS). The IGS of each strain was calculated by generating, in silico, a mock sample containing a 1:1 ratio of both strains and then mapping the sample reads back to each genome: The IGS for each strain was calculated from the number of reads that mapped uniquely to that strain’s genome. The ratio of the two strains was calculated as ρ(FitSeq) after normalization by IGS.

Because bacterial cells had been harvested from LB agar plates after a 24-hour incubation, to calculate the ratio (ρ) of strains in fecal samples, we needed to transform ρ(FitSeq) using the relative growth rate. To measure the relative growth rate, we grew a 1:1 mixture of strain Ec72_E1U1 and Ec72_E3U1 on LB agar for 24 hours at room temperature as above. Bacterial cells were collected, and the detected ratio [ρ(1:1)] was calculated with FitSeq. Knowing that the original ratio of the two strains was 1, the relative growth rate ratio (v1/v2) of the two strains could be calculated as v1v2=ρ(1:1). Five independent mixtures were prepared and measured, and the average results were used as the ratio for growth rate of the two strains on LB agar. Thus, we were able to calculate the ratios of two strains in fecal samples as ρ=ρ(FitSeq)v1v2=ρ(FitSeq)/ρ(1:1).

Characterization of virulence-related phenotypes in vitro

Hemagglutination assays were performed on cells normalized to an OD600 of 1.0 as described previously (n = 3 biological replicates performed on different days; two technical replicates per biological replicate) (75). For biofilm assays, shaken cultures were grown overnight at 37°C in LB and subcultured at a 1:1000 dilution into LB and grown statically in untreated polyvinylchloride 96-well plates at room temperature for 48 hours. Adherent biomass was stained with 0.05% crystal violet, rinsed, solubilized with 35% acetic acid, and quantified by measuring absorbance at 595 nm of the solubilized crystal violet (assays performed in duplicate; five technical replicates per biological replicate). Pellicle biofilms were performed by subculturing a 1:1000 dilution of an overnight LB shaken culture into YESCA medium and incubating statically at 30°C for 72 hours (pellicle assays performed twice). For motility assays, cultures were incubated statically at 37°C for 24 hours in 10 ml of LB, subcultured at a 1:1000 dilution into 10 ml of fresh LB, and incubated again statically at 37°C for 24 hours. Swimming motility was measured in 0.25% LB agar, and swarming motility was measured in 0.6% LB agar supplemented with 0.5% glucose (swimming and swarming data shown are representative of results obtained from two separate experiments) (75). Hemolysin production was examined by plating on blood agar (assays performed in duplicate).

Supplementary Materials

Fig. S1. Maximum likelihood fimH gene tree incorporating the UPEC strains characterized in the present study.

Fig. S2. Hierarchical clustering of 404 E. coli strains including the 26 characterized in the present study.

Fig. S3. The episode 3 UTI strain Ec72_E3U1 forms IBCs.

Fig. S4. Assays of urinary tract colonization with episode 1 and 3 strains from patient 72.

Fig. S5. Parameter testing and validation of FitSeq.

Fig. S6. In vitro characterization of Ec72_U1E1 and Ec72_U3E1.

Fig. S7. Genes encoding flagellar proteins that are present/absent in the genomes of UTI episode 1 (Ec72_E1U1) and episode 3 (Ec72_E3U1) strains.

Fig. S8. UPEC virulence–associated elements present/absent in the genomes of UTI episode 1 (Ec72_E1U1) and episode 3 (Ec72_E3U1) strains recovered from patient 72 compared to UPEC strain UTI89.

Fig. S9. PTS pathway components involved in l-sorbose utilization.

Fig. S10. Comparison of growth phenotypes of Ec72_E1U1 and Ec72_E3U1 strains from UTI episodes 1 and 3.

Table S1. Clinical characteristics and treatment of the four patients with three episodes of recurrent UTI.

Table S2. Summary of 45 isolates analyzed for this study, including those subjected to whole-genome sequencing.

Table S3. Genes targeted for MLST and primers used for generating PCR amplicons for DNA sequencing.

Table S4. Genome sequencing and assembly metrics for urine and fecal isolates obtained from patients 13 and 72 during their three episodes of UTI.

Table S5. Reference genomes used for OGU- and SNP-based analyses.

Table S6. Representation of known virulence factors in the genomes of the 26 isolates from the present study and in 54 other E. coli genomes classified as complete in the NCBI RefSeq database.

Table S7. Comparison of YESCA pellicle, Congo Red staining, and swarming phenotypes of strains EC72_E1U1, EC72_E3U1, and UTI89.

Table S8. Representation of genes involved in flagellar assembly and PTS-sorbose systems in the 26 clinical isolates from the present study and in the 54 reference complete E. coli genomes.

Table S9. Representation of genes (OGUs) assigned to KEGG pathways in the shared and variable components of the EC72_E1U1 and EC72_E3U1 genomes.

References and Notes

  1. Acknowledgments: We thank J. Hoisington-Lopez for assistance with DNA sequencing; D. O’Donnell and M. Karlsson for their assistance with mouse husbandry; and A. Goodman, A. Kau, M. Gonzalez, and J. Faith for their invaluable comments and help during the course of this work. Funding: Supported by NIH grant AI048689, and a Specialized Centers of Research grant DK064540 from the NIH Office of Research on Women’s Health and National Institute of Diabetes and Digestive and Kidney Diseases. Data generated for fig. S2 used a computer cluster supported by the National Research Foundation Singapore under its National Research Foundation (NRF) Fellowship (NRF-RF2010-10) and the Genome Institute of Singapore (GIS)/Agency for Science, Technology and Research (A*STAR) (to S.L.C.). Author contributions: S.L.C., M.W., S.J.H., and J.I.G. designed the experiments; S.L.C. and M.W. sequenced and assembled isolate genomes, developed and implemented FitSeq, and performed all mouse experiments and phenotype microarray analyses; M.E.H. analyzed IBC formation and motility in vitro; J.P.H. assisted in the analysis of siderophore systems; T.M.H. oversaw collection of human biospecimens; M.W., S.L.C., S.J.H., and J.I.G. analyzed the data; and S.L.C., M.W., S.J.H., and J.I.G. wrote the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: Genome sequences from UPEC strains have been deposited in GenBank with BioProject accession number PRJNA187034; new MLST sequences have been deposited in as ST2838-ST2846.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article