PerspectiveInfectious Disease

Pathogen Microevolution in High Resolution

See allHide authors and affiliations

Science Translational Medicine  27 Jan 2010:
Vol. 2, Issue 16, pp. 16ps4
DOI: 10.1126/scitranslmed.3000713


Microbial genomics has revolutionized infectious diseases and epidemiology research and is facilitating the tracking and containment of emerging biological threats. Among the most serious contemporary infectious agents are multiple antibiotic–resistant strains of the human pathogen Staphylococcus aureus, which present a formidable public health challenge that is no longer limited to hospitalized patients. To address key hypotheses regarding microbial strain evolution or virulence, conventional genotyping methods do not offer enough power to resolve minor changes between closely related strains. The application of next-generation high-throughput genotyping technologies, as illustrated in a recent analysis of a highly resistant S. aureus strain, can provide new clues about the geographical origin and intrahospital spread of important microbial pathogens.

In October 2009, fewer than 15 years after the first bacterial genome was sequenced in its entirety (1), the National Center for Biotechnology Information announced the attainment of the 1000th complete microbial genome sequence; this wealth of information adds to the more than 2000 viral genomes that have been fully sequenced and deposited in public databases. With recent unparalleled advances in sequencing technologies (2), the scientific community can soon expect 10,000 microbial genomes to be solved, broadly covering taxa from all branches in the entire tree of life (3). Full-fledged efforts also have been initiated to sequence and publish numerous human genomes (4) and microbiomes (5), cataloging variations among individuals and resident microflora and their potential link to infectious or inflammatory disease pathologies. Here we discuss a study by Harris et al. in a recent issue of Science (6) that has examined the power and functionality of high-resolution genotyping applied to bacterial molecular epidemiology.

Microbial genomics is already being translated in meaningful ways to public health epidemiology and the clinical practice of infectious disease medicine (7). One success story is the etiological identification, tracking, and containment of the severe acute respiratory syndrome coronavirus in 2002–2003 (8), a near-pandemic that otherwise could have claimed thousands more lives. Current global surveillance of swine-origin H1N1 influenza provides another example (9). Contemporary genome-wide approaches are being applied in the development of rapid diagnostic tools against bioterror threats and emerging infectious disease agents (10), microbial forensics (11), and reverse vaccinology platforms targeting leading human bacterial pathogens (12).

Genomic fingerprinting provides a basis for understanding the relatedness of microbial strains. DNA comparison is the core methodology of molecular epidemiologists, who study the emergence and geographical spread of virulent or drug-resistant clones and seek to decipher the reservoirs and transmission modes of pathogens during regional disease outbreaks. In recent years, techniques such as multilocus sequence typing (MLST) (13) have begun to supplant pulse-field gel electrophoresis and related DNA-banding subtyping methods, yielding greater insight into the population structure and evolutionary dynamics of leading human bacterial pathogens. However, MLST allows only a crude estimate of the total variation present in bacterial genomes, which typically exceed 2 Mb in size. MLST is further constrained by the preselection bias of the handful of genes to be analyzed, classically encoding housekeeping enzymes rather than virulence determinants central to the infectious disease process. Logic dictates that a snapshot of the entire genome of each strain isolate would provide the ultimate information trove for infectious disease detectives. The ever-increasing speed and cost efficiency of next-generation sequencing technologies are now making this a reality.

The recent Science paper by Harris et al. (6) focuses on high-resolution genotyping of methicillin-resistant Staphylococcus aureus (MRSA) as the showcase pathogen. S. aureus can produce clinical disease throughout the body and in all patient groups and has the proven capacity to develop resistance to all current classes of antimicrobial agents. MRSA presents a critical global challenge to the public health (14), with incidence rates ranging from less than 1% of S. aureus disease isolates in the Netherlands and Sweden (15) to more than 50% in the United States (16), Japan (17), and Portugal (15). The U.S. Centers for Disease Control estimated that more than 90,000 Americans developed invasive MRSA in 2005 and that more than 18,000 patients died during their hospital stay from these serious infections (18). In recent years, severe community-acquired MRSA infections, often in healthy individuals without medical care risk factors, have been identified with alarming frequency (19).

Harris et al. conducted high-resolution genotyping using Illumina index tagging paired to a genomic sequence analyzer (20) to dissect subtle genomic differences among MRSA isolates from a prominent clonal lineage (sequence type 239, or ST239, by MLST) that spread around the globe over a two-decade period. The authors examined a geographically and temporally dispersed sample (43 isolates) to understand the ST239 global population structure, while a second sample (20 isolates) from a single facility in Thailand was used to study the potential chain of transmission in a hospital-infection control setting. With this technique, genome-wide information is acquired, yet without the need for complete genome sequencing of each individual isolate; in effect, single-nucleotide polymorphisms (SNPs) or insertions and deletions (indels) are mapped back to a reference genome. The approach delivers a few new insights regarding intercontinental spread and local epidemiology, as well as the microevolution of a bacterial lineage in response to antibiotic selection pressures.

Geographical structuring of SNPs among the isolates in the MRSA ST239 strain collection suggested a model to represent the evolutionary history of the clone (Fig. 1). Isolates from countries including Turkey, Portugal, and Hungary showed the greatest diversity and were most often situated toward the root of the phylogenetic tree, pointing to a possible European origin for the clone. On the other hand, South American and Southeast Asian isolates formed more tightly clustered and uniform clades, indicating more recent expansion of a unique variant on each continent. Exceptions to the geographical structure suggested subsequent reintroduction of these new variants to the European continent. These included one pandemic clone of South American origin that was the most prevalent MRSA isolate in Portuguese hospitals in the late 1990s (21) and another clone of Southeast Asian origin that produced a sustained outbreak of vascular device–associated bacteremia in a London intensive care unit (22). Overall, the calculated mutation rate among the collected ST239 MRSA strains (3.3 × 10−6 per site per year) corroborated recent MLST-based analyses (23) and suggested a clonal origin of ST239 in the late 1960s, coincident with the first appearance of MRSA in European hospitals.

Fig. 1. Phylogeography of methicillin-resistant S. aureus strain ST239.

A map showing the geographical spread of MRSA ST239 as deduced by Harris et al. (6).


Overall, less than 1% of the analyzed SNPs were described as homoplasic; that is, suggestive of convergent evolution—the attainment of a particular SNP in unrelated lineages. More than a quarter of these homoplasies occurred in genes known to play a role in resistance to currently used antibiotics such as trimetho‐prim, rifampin, mupiricin, fusidic acid, and quinolones. These findings appear to confirm that the clinical practice of antibiotic selection and use can be a major driver of pathogen microevolution. Recent studies have shown how the overuse of certain classes of broad-spectrum antibiotics can promote MRSA nasal carriage or influence hospital MRSA infection rates (24). The application of new-generation sequencing analysis to such investigations may provide further definitive bases for optimizing hospital formulary policies.

When a collection of MRSA ST239 isolates from the Southeast Asian clade recovered in the same hospital during a 7-month period was analyzed, a surprising amount of SNP variation was identified using the next-generation sequencing platform; such absolute discrimination of clinical isolates in the infection control setting could not be achieved by prior methods. Five closely related isolates could be linked to adjacent wards of the hospital that were not associated with the larger, more divergent group, thus suggesting an epidemiological link of transmission. The discriminating power of the technology promises to provide more accurate clues regarding an emerging epidemic clone and could thereby facilitate targeted interventions. Ultimately, one can envision extending the analysis to probe the complete ecology of MRSA transmission dynamics. This would entail the evaluation of non–disease-associated staphylococcal isolates, such as those producing transient colonization of patients or health care workers or those contaminating fomites or hospital environmental surfaces.

In their analysis, the authors made a key distinction between two components of the MRSA genomes (Fig. 2). The core genome represents genes shared by most strains. These are thought to be of ancestral S. aureus chromosomal origin and, because of their older age, are subjected to higher variability. The variability of genes within the core genome is uneven and is dependent on whether individual SNPs or indels produce functional alterations in their encoded products and whether such changes are under positive, negative, or neutral selection pressure. The noncore or accessory genome contains mostly horizontally transferred genes, encoded on mobile genetic elements (MGEs) such as bacteriophages, transposons, pathogenicity islands (enriched in superantigen genes), plasmids, and the so-called staphylococcal cassette chromosome (SCC) elements (25, 26). Sequences of these genes tend to be less variable among strains in which they are present, because they were recently acquired. Among the most notorious of MGEs within MRSA accessory genomes is SCCmec, which encodes the altered penicillin-binding protein 2a (PBP2a’) with reduced affinity for methicillin and related β-lactam antibiotics, and the bacteriophage ϕSa2(MW2), which contains the genes that encode the subunits of the Panton-Valentine leukocidin (PVL). Epidemiologically, PVL expression is strongly associated with highly virulent community-acquired MRSA clones such as USA300 (27), although mechanistic linkages of the toxin to clinical outcomes or to pathogenesis in animal models remain controversial (2830).

Fig. 2. Bidimensional model of a microbial pathogen variome.

The variome of S. aureus and other pathogens can be conceptualized in a bidimensional architecture consisting of vertical and horizontal gene pools. The central cylinder represents the core genome and its somehow steady temporal evolution. Phylogenies and typing methods based on core genome elements (for example, restriction fragment length polymorphism and MLST) reflect only the vertical evolution of the species, whereas elements of the accessory genomes (green halo) can produce more rapid evolutionary leaps via horizontal gene transfer. Both genomic dimensions need to be taken into consideration to produce accurate, clinically relevant phylogenies. Arrows reflect continuous bidirectional genetic exchange between the core and accessory gene pools.


The whole set of genetic variations present in a species population undergoing microevolutionary change is often referred to as a variome. To meet their goals of elucidating patterns of intercontinental or intrahospital spread of MRSA ST239 clones, Harris et al. focused incisively on SNPs that reside in the core variome, because horizontal gene transfer events could confound the accuracy of the deduced phylogenetic tree. We are now appreciating that when major changes are observed in the epidemiology, clinical disease spectrum, or antibiotic resistance profile of an established pathogen, the responsible genetic changes are transpiring in the accessory genome through horizontal gene transfer.

A deeper understanding of the dual dimensionality of microbial genotypes, reflected in their core and accessory genomes, is required for microbiologists and infectious disease physicians to come to grips with a chronic problem of typing and classification. For over a century, researchers have documented a contribution of horizontal gene transfer to microbial genotypes and phenotypes. However, the impact of horizontal gene transfer was greatly underestimated until the flood of new microbial genome sequences (31) revealed the major effects of MGEs on microbial genomes’ architecture and characteristics. Two phylogenetically closely related strains can be dramatically different in their pathogenic potential, which is dependent on the recent acquisition or loss of a critical MGE. Prior strain classification schemes, including the most advanced ones such as MLST, have traditionally focused on the vertical lineage of a strain. New-generation sequencing technologies, while maintaining and enhancing analysis of the core genome for vertical phylogeny, can also provide a comprehensive content inventory of the accessory genome. This second dimension, which reflects the content of phage and other MGEs, can help predict whether the strain is associated with invasive disease potential and whether it will be resistant to certain antibiotics (even if its closest ancestor is not).

The impending application of next-generation sequencing to the molecular epidemiology of microbial pathogens will provide a veritable deluge of candidate SNPs in search of meaning. Facing potential information overload, the infectious disease research community will need to separate the proverbial wheat from the chaff. Beyond refining phylogenetic analyses as illustrated by the work of Harris et al., we note that the technology may be applied fruitfully to evolutionary processes that occur during infection of the individual patient. Most leading human pathogens, including S. aureus, commonly exist in healthy individuals as a commensal of epithelial surfaces (for example, the upper respiratory mucosa, skin, and gut). In the course of invasive infection, as it spreads to the deep tissues and bloodstream, the organism encounters intense natural selection from phagocytic cells and other components of the innate immune system. The variant that is optimized for systemic survival may differ from that best suited for colonization, in which competition against resident microflora for epithelial adherence and nutrients produces the strongest selective pressures. This is well illustrated by the globally disseminated M1T1 clone of Streptococcus pyogenes, where the application of next-generation sequencing technology (32) showed that invasive disease follows specific mutations in the covR/S regulatory locus, leading to hyperencapsulation and up-regulation of toxins and immune evasion proteins (33, 34). Genome-wide analysis of SNPs between paired strains isolated from the bloodstream or nasal mucosa of the same patient, or between input and output strains generated in a relevant animal model, could reveal similar genetic switch mechanisms critical for invasive disease by MRSA and other disease agents.

An argument can be made that for microbial pathogens, the truth is hidden in the genome. We are getting closer to the truth.


  • Citation: R. K. Aziz, V. Nizet, Pathogen microevolution in high resolution. Sci. Transl. Med. 2, 16ps4 (2010).

References and Notes

  1. Next-generation sequencing technologies are becoming more widely available and cost-efficient. Platforms by other vendors, including FLX(454) (454 Life Sciences, a Roche Company), ABI SOLID (Life Technologies), and Helicos HeliScope (Helico Bioscience Corporation), could provide similar functionalities as Agilent/Illumina for genome-wide microbial strain characterization.

  2. The authors declare no conflicts of interest.
View Abstract

Navigate This Article