Research ArticleGenomics

Natural Selection in a Bangladeshi Population from the Cholera-Endemic Ganges River Delta

See allHide authors and affiliations

Science Translational Medicine  03 Jul 2013:
Vol. 5, Issue 192, pp. 192ra86
DOI: 10.1126/scitranslmed.3006338


As an ancient disease with high fatality, cholera has likely exerted strong selective pressure on affected human populations. We performed a genome-wide study of natural selection in a population from the Ganges River Delta, the historic geographic epicenter of cholera. We identified 305 candidate selected regions using the composite of multiple signals (CMS) method. The regions were enriched for potassium channel genes involved in cyclic adenosine monophosphate–mediated chloride secretion and for components of the innate immune system involved in nuclear factor κB (NF-κB) signaling. We demonstrate that a number of these strongly selected genes are associated with cholera susceptibility in two separate cohorts. We further identify repeated examples of selection and association in an NF-κB/inflammasome–dependent pathway that is activated in vitro by Vibrio cholerae. Our findings shed light on the genetic basis of cholera resistance in a population from the Ganges River Delta and present a promising approach for identifying genetic factors influencing susceptibility to infectious diseases.


Several lines of evidence suggest that cholera exerted a potent selective pressure on the human population in Bangladesh. Cholera is an ancient disease in the Ganges River Delta (1) and remains highly prevalent in Bangladesh today, with up to 4 per 1000 affected annually in urban slums of Dhaka (2). By the age of 15 years, more than half of the population of Bangladesh has serological evidence of previous cholera infection (3, 4). Historically, the mortality rate of cholera exceeded 50% and remains as high as 5 to 10% in recent outbreaks (57). Cholera’s ancient origins, high prevalence, and high fatality rates suggest strong evolutionary pressure, and observational data support this. In particular, the Ganges River Delta has the world’s lowest prevalence of blood group O, which is associated with an increased risk of severe cholera (8, 9). In addition, first-degree relatives living in the household of cholera patients have nearly threefold higher risk of cholera than unrelated household contacts (4, 8, 9), suggesting that susceptibility to cholera has a heritable component.

A history of selective pressure should make it easier to identify host variants associated with cholera susceptibility and severity, an approach that can give insight into disease pathogenesis (1012). Natural selection drives beneficial mutations to rise in prevalence, generating common mutations of strong effect that are detectable with small samples. In addition, the process of selection leaves a distinctive signature in the genome that can add power to a genome-wide association study (GWAS) (1315).

The virulence factors of the human-specific pathogen Vibrio cholerae, which causes cholera, are well characterized (1620). Toxigenic strains of V. cholerae colonize the surface of the small intestine and express cholera toxin, an AB5 bacterial toxin that induces profound diarrhea mediated by secretion of chloride through apical chloride channels (21, 22). Less is known about host factors that contribute to cholera susceptibility and about the interaction between V. cholerae and the human intestine. Pathways related to mucosal immunity and intestinal homeostasis likely play an important role (20).

We sought to identify host genetic factors associated with susceptibility to cholera by combining selection, association, and functional studies. We previously developed the composite of multiple signals (CMS) method to pinpoint regions of positive selection in the genome with more power and specificity than any single test (23, 24). Here, we used CMS to identify targets of positive selection in a population from the Ganges River Delta and then searched for association with cholera resistance among the identified targets.


Genetics of the study population

We genotyped 42 mother-father-child trios (126 individuals) of Bengali ethnicity from Dhaka, Bangladesh (abbreviated “BEB” for Bengali of Bangladesh) on the Illumina 1M array, yielding 36 complete trios and 1,112,946 single-nucleotide polymorphisms (SNPs) after quality filtering. We compared this population with publicly available data for 16 populations: 8 from the International HapMap3 Project (HapMap3), 3 from the Singapore Genome Variation Project, and 5 from the Human Genome Diversity Project (table S1) (2527).

The BEB population is a distinct genetic grouping with no evidence of substantial genetic structure; principal components analysis reveals the population to be tightly clustered with little overlap with any other HapMap3 population (Fig. 1A and fig. S1). Among the eight HapMap3 populations, the closest relatives of the BEB are the Indian Gujarati (FST = 0.0047) (table S2). The BEB are closer than the Gujarati to the Japanese and Han Chinese, potentially reflecting admixture. We tested seven populations (21 pairs) as ancestors using the three-population test (28) and found that admixture between Indian and East Asian populations best explains the observed genetic architecture of the BEB (table S3). Whereas all pairs of East Asian and Indian populations fit well, of those included, the Singapore Indians and the Singapore Chinese most closely match the ancestors of the BEB population (Z = −48.4).

Fig. 1 The Bengali population from Dhaka, Bangladesh, has no evident structure and descends from ancient admixture between India and East Asia.

(A) The first two principal components: Bangladeshis (red) cluster closest to the Gujarati Indians (orange), which is consistent with geography (inset map), and are shifted slightly toward the East Asian populations (purple). (B) The decline of admixture LD in the Bangladeshis suggests a founder event 52 ± 2 generations ago (~500 AD). History was estimated using the pulse admixture model–based ROLLOFF method, which is robust to inaccurate parental populations and other modeling issues (31). (C) ADMIXTURE maximum likelihood estimation of ancestries (five clusters, coefficient of variation error = 0.375) estimates that the BEB are 89.7 ± 2.4% “Indian” ancestry (red) and 9.3 ± 2.6% “East Asian” ancestry (purple). The starred populations most closely fit the ancestral admixed populations.

We estimate that the admixture between Indian and East Asian populations occurred 52 ± 2 generations ago (generation = 29 years) (29), or around 500 AD, based on the exponential decline of linkage disequilibrium (LD) with distance analyzed using ROLLOFF (30, 31) (Fig. 1B). This remarkably close-fitted age estimate roughly corresponds to the collapse of the Indian Gupta Empire, the rise of the Chinese Tang dynasty, and the brief unification of Bengal under a single ruler (590 AD to 625 AD). Although alternative histories, such as continuous admixture or multiple admixture events, are possible, the single-event model shows excellent fit to our data, and we found no statistical support for very ancient flow (fig. S2). Using the maximum likelihood–based ancestry estimation software ADMIXTURE (32), we found 9.3 ± 2.6% East Asian ancestry in the BEB (Fig. 1C, fig. S3, and table S4).

Natural selection in the BEB population

The CMS test combines three indicators of positive selection—long haplotypes, high-frequency derived alleles, and highly differentiated alleles—to identify narrow candidate regions, in many cases pinpointing a single gene or genomic element as the probable target of selection (23). We measured natural selection in the autosomal genome of the BEB population with CMSGW, a genome-wide test (24), using for comparison three sequenced populations from the 1000 Genomes Project: north/west Europeans, Han Chinese/Japanese, and Yoruba (795,517 overlapping SNPs) (table S1) (33). We identified 305 regions with signals of natural selection [normalized CMSGW score >3; false-positive rate (FPR) <0.01%; Fig. 2 and table S5], with a median size of 149 kb and encompassing 2% of the autosomal genome. On average, the selected regions contain 2.3 ± 3.6 genes; 95 regions contain 1 gene, and 72 have no genes. Ninety-one regions contain long intergenic noncoding (LINC) RNAs, including 29 nongenic regions (table S6).

Fig. 2 Signals of natural selection in the BEB population are enriched for genes from three distinct gene sets.

CMS scan across the human autosomes identifies 305 selected regions (solid black bars with scores as red vertical bars; gray vertical bars show scores for all other regions). Genes in three enriched gene sets are shown with colored labels (see figure key). Candidate genes in 28 regions (black boxes) were tested for association with cholera susceptibility.

Enrichment of gene sets in selected genomic regions in the BEB population

Enrichment analysis of gene categories is used to identify common functions among candidate regions identified in GWAS (34). It can similarly be used with selection candidates, providing clues to the evolutionary forces shaping a population. However, given that selection acts on many distinct traits, we expect to see clear evidence of gene enrichment only in extreme cases, when multiple genes in a particular pathway are targeted by strong selection for a single trait (for example, skin pigmentation in Europeans).

We looked for evidence of enrichment in the 305 selected regions in the BEB population using INRICH, a permutation-based genome-wide analysis tool that is robust to confounding factors like marker density and gene size (35). INRICH reports the empirical significance of enrichment for each gene set (Pset) and a corrected experiment-wide empirical significance (Pexp). We considered Pexp < 0.05 to represent significant enrichment. We tested 851 sets of 10 or more genes from the Molecular Signatures Database (MSig-c4), which identifies sets of genes with shared expression patterns (36). We also tested 2430 manually curated sets of 10 or more genes from the Gene Ontology (GO) (37).

One gene set was significantly enriched in the BEB population—a module of genes in the expression neighborhood of IKBKG (MORF_IKBKG, Pset = 5.2 × 10−5, Pexp = 0.017) (Table 1). MORF_IKBKG was one of just two gene sets significantly enriched in any population (table S7). IKBKG encodes the protein NEMO (IKK-γ), a subunit of the IκB kinase that activates canonical signaling of nuclear factor κB (NF-κB). Sixteen of the 110 genes in the MORF_IKBKG gene set lie in 15 distinct genomic regions that show evidence of natural selection in the BEB population (Fig. 2). We observed no enrichment for this gene set in the East Asians, the Europeans, or the Yoruba (Pset = 0.56, 0.44, and 0.17, respectively).

Table 1 Gene sets significantly enriched in candidate regions of natural selection in the BEB population.

We used INRICH to calculate empirically the significance of the enrichment for each gene set (Pset) and the experiment-wide significance corrected for the number of gene sets tested (Pexp) (35). For the GO and potassium channel analyses (marked with *), no significant enrichments were found across all candidate selected regions, and the analysis was rerun just on regions with normalized CMSGW scores >5. No P value is shown for sets with 0 selected genes. See table S7 for all sets with Pset < 0.05.

View this table:

Many genes in the top selected regions in the BEB population, including 3 of the top 10 regions, were classified as potassium ion transport genes (table S5). Enrichment analysis of GO sets in strongly selected regions (CMSGW score >5) ranks potassium ion transport (GO:0006813) third (Pset = 3.8 × 10−3) (table S7). To further investigate, we tested 19 categories of potassium channel genes, defined by their physiological function in gastrointestinal epithelial cells (38), and found significant enrichment for voltage-gated potassium channel genes (5 of 47 genes, Pset = 1.6 × 10−3, Pexp = 0.008) and in particular for EAG-related genes (3 of 8 genes, Pset = 5.2 × 10−3, Pexp = 0.03) (Fig. 2, Table 1, and table S8).

We also tested for enrichment in 31 blood group systems (table S8) because of the association between blood group O and severe cholera. Genes related to the Kell blood group system were significantly enriched in BEB selected regions (three of eight genes, Pset = 4.4 × 10−3, Pexp = 0.007) (Table 1 and Fig. 2) and in East Asians (two of eight genes, Pset = 0.025, Pexp = 0.03) (table S7). We detected no selection in ABO blood group genes, possibly because CMSGW is designed to detect positive selection on novel variants, not the negative selection against long-standing genetic variation (that is, the O blood group) expected at the ABO locus.

Testing of top regions for association with cholera susceptibility

To test the hypothesis that resistance to cholera drove selection in the BEB population, we performed an association study in a separate, well-phenotyped cohort of Bangladeshis of Bengali origin from Dhaka composed of 105 cholera patients and 167 unaffected individuals exposed to a case of cholera within their household (39). This cohort included 38 parent-affected child trios. We genotyped 536 SNPs in 28 selected regions; these regions included the top 10 selected regions in the BEB genome and regions that contained genes in enriched gene sets or that were biologically plausible candidates for cholera resistance (table S9). Three hundred seventy polymorphic SNPs (minor allele frequency >0.01) passed quality filters, including 19 SNPs randomly chosen from regions without evidence of natural selection. We tested for association using the DFAM method in PLINK (40), which combines related and unrelated individuals into a single analysis. The region with strongest association to cholera is within the top signal of natural selection in the genome and encompasses five genes: SNRNP200, CIAO1, ITPRIPL1, NCAPH, and TMEM127 (Fig. 3, A and B, and table S10). The most associated SNP is between SNRNP200 and ITPRIPL1 (rs62153901; PDFAM = 0.0015, P = 0.042 after Bonferroni correction for 28 independent loci tested), 31 kb from the top SNP in the selection analysis (rs3171927; in SNRNP200).

Fig. 3 Top signal of selection in Bangladeshi population is associated with cholera susceptibility.

(A) Association testing of 536 SNPs genotyped in 104 cholera cases and 167 controls in 28 candidate selected regions (empty black circles); the strongest selection signal in the genome corresponds to the region with the strongest association (red filled circles), exceeding the maximum association at 19 randomly selected SNPs (dashed horizontal lines) by more than threefold. SNPs in genes LY6G6D and C5 also have P < 0.05 (genes shown in black vertical text). Stricter phenotype definition (severe cholera; hollow red circles) yields three additional associated regions with P < 0.05, including two containing potassium channel genes (KCNH7, RPS6KB2, and KCNH5). Replication results are shown as blue bars. (B) The association (red filled circles) in region 2 (5) overlaps the signal of selection (hollow black circles, right axis). The peak, bracketed by red dashed lines, encompasses five genes. The SNP with the highest CMS score in the genome (pink line) sits in an intron of the gene SNRNP200. The region under this peak is well conserved across placental mammals (blue bars).

This association is unlikely to reflect population stratification. The top associations persist (Spearman correlation of 0.76 for SNPs with P < 0.01) when we restrict our analysis to the 38 parent-affected child trios in our data set using the transmission disequilibrium test, a method that internally controls for population stratification (41). Allele sharing across the 19 random markers, measured as the identity-by-state (IBS) distance (40), is similar within cases (0.734 ± 0.068), within controls (0.735 ± 0.070), and between cases and controls (0.734 ± 0.69), consistent with the lack of substructure in the population (Fig. 1A).

Our association cohort contains individuals with a range of severity of cholera, including cases with severe dehydration requiring hospitalization. Restricting our analysis to these severe cases reduces our statistical power but may also increase our sensitivity through a more rigorous phenotype. We indeed found that analyzing the most severe cholera cases alone resulted in the identification of additional associations between cholera and SNPs in genes located in three selected regions: the potassium ion transport genes KCNH7 (P = 0.019) and KCNH5 (P = 0.036), and the ribosomal protein kinase gene RPS6KB2 (P = 0.049) (Fig. 3A and table S10).

We performed a replication study on the top 12 associated regions to confirm the association to cholera (44 SNPs; 204 SNPs after imputation). We genotyped 124 unrelated, severe cholera cases and performed a case-control analysis with 72 parents from the trio population cohort as unphenotyped controls, measuring association at 49 haplotype blocks of SNPs in strong LD (42). We found that the top region in the original association study was among the four regions most associated with cholera (P < 0.01) in this separate analysis (Fig. 3A and table S10). We also found a new association on chromosome 16, a gene-dense region that includes the gene PYCARD and ranked fifth in the selection study.

Biological association between genes under natural selection and cholera

Our selection scan and enrichment analysis indicate that recent, positive natural selection in the BEB population has repeatedly targeted genes related to IKBKG, a key component of NF-κB signaling. Our studies further indicate that certain genes in the top regions of selection are specifically associated with cholera. We and others have recently shown that V. cholerae lipopolysaccharide (LPS) incites proinflammatory innate immune responses through activation of NF-κB signaling pathways (43, 44), suggesting that cholera may represent a selective pressure on this pathway.

We also noted that the selected regions in the BEB population contain genes related to activation of the inflammasome, including PYCARD (also known as Asc) (table S5). To further investigate this, we evaluated the effect of cholera toxin on inflammasome activation in mouse macrophages in vitro and found that it induced robust interleukin-1β (IL-1β) secretion in LPS-primed mouse macrophages (Fig. 4). This effect was abrogated in macrophages from mice deficient in caspase-1 (Ice−/−) and PYCARD (Asc−/−), two key components of the inflammasome signaling pathway. Thus, cholera toxin leads to caspase-dependent induction of proinflammatory cytokines, consistent with V. cholerae infection resulting in activation of the inflammasome.

Fig. 4 Cholera toxin induces caspase-dependent IL-1β secretion consistent with inflammasome activation.

Cholera toxin (CT) induces robust IL-1β secretion in LPS primed mouse macrophages but not in macrophages deficient in caspase-1 (green) or PYCARD (purple), key components of inflammasome signaling (*P = 0.0001 and 0.0005, respectively, when compared to wild type using Student’s t test). The average (bars) and SEM (vertical black lines) are shown for three experimental replicates, each with three sampling replicates.


Using Ingenuity’s IPA software (45), we developed a model of the human innate immune signaling pathways that respond to V. cholerae infection and have been selected in the BEB population (Fig. 5). In this model, inflammasome activation and the NF-κB signaling pathway play an integrated role in Toll-like receptor 4 (TLR4)–mediated sensing of V. cholerae. This model is consistent with recent in vitro data regarding innate immune sensing of Gram-negative bacteria (46) and is supported by in vitro findings of NF-κB signaling (43) and inflammasome activation (47) by V. cholerae antigens.

Fig. 5 Human innate immune signaling pathways may be targeted by selective pressures, including cholera, in the BEB population.

We developed a model of the cholera-related immune pathways under selection in the BEB population by mapping all interactions catalogued by Ingenuity’s IPA software between strongly selected genes, including IKBKG, and genes shown to respond to cholera toxin. Upon exposure to V. cholerae, TLR4 and GM1 ganglioside interact with V. cholerae LPS and cholera toxin, resulting in coordinated activation of the NLRP3 inflammasome and the NF-κB regulatory complex, through multimerization of PYCARD and assembly of the NEMO complex, respectively. NEMO is encoded by IKBKG (red dashed box), the central gene in a gene set that is significantly enriched for selection in the BEB. TLR4 stimulates production of interferon-β (IFN-β), which cleaves procaspase-4 into mature caspase-4. The NLRP3/PYCARD complex cleaves procaspase-1 into mature caspase-1. Caspase-1 and caspase-4, central mediators of the inflammasome pathway, then facilitate cell death and release of proinflammatory cytokines, such as HMGB1, IL-18, and IL-1β. Genes in selected regions (red boxes), including three specifically associated with cholera susceptibility (red shaded boxes), interact with regulatory components of the proposed pathway. Genes that are significantly up-regulated during acute cholera are denoted with red arrows. Interactions are shown as arrows or dashed lines; those found using Ingenuity Knowledge Base are labeled with references in parentheses (50, 8188).

Our results suggest that natural selection in the BEB population has repeatedly acted on key regulators of the proposed V. cholerae response pathway. In particular, genes expressed in concert with IKBKG, which encodes the protein NEMO, are the most strongly enriched MSigDB gene set in the BEB population. NEMO plays an essential role in the canonical activation and regulation of NF-κB, the master regulator of inflammation, immunity, and cell survival (48). The MSigDB IKBKG module also includes RPS6KB2, a ribosomal protein S6 kinase gene that is both under selection in the BEB population and associated with severe cholera in our disease association study. Although the function of RPS6KB2 is poorly understood, other members of this family are known to regulate NF-κB and factors associated with gut inflammation and apoptosis (49). We found the most significant overlap of our selection and cholera association data near the novel gene SNRNP200, which encodes a previously uncharacterized RNA splicing, adenosine triphosphate–dependent helicase shown to bind caspase-4 (50). The murine homolog of caspase-4, caspase-11, is a key regulator of caspase-1 activation in response to Gram-negative bacteria, including V. cholerae, and subsequent inflammasome activation (47). PYCARD (Asc), located in the fifth ranked selected region, mediates the formation of the multiprotein inflammasome complex. Several downstream effectors of the proposed innate immune signaling pathways, including ITGAM and IL-1β, are also under selection in the BEB population or are highly expressed in the duodenum of cholera patients (51).

Cholera in Bangladesh is ancient, prevalent, and often fatal. There is compelling evidence that heritable factors influence susceptibility to the disease, yet little is known about the specific genetic factors that are involved. Our scan for natural selection in a historically cholera-affected population from Bangladesh, combined with disease association and biological data, elucidates new host factors involved in cholera. We note that, although our analysis found no evidence that the association reflects population stratification, this potential confounder cannot be completely excluded because of the small size and targeted design of the association study.

Our results suggest that cholera has exerted strong selective pressure on key pathways of innate immunity, including NF-κB and inflammasome signaling, in this population from Bangladesh. Our candidate regions commonly include regulatory components of these pathways rather than the central mediators. This suggests a biologically sensible hypothesis, namely, that selection acts to modulate these fundamental pathways instead of affecting the key components.

Functional studies increasingly support the role of innate immunity in the response to V. cholerae infection. In a recent screen of Drosophila insertion mutants, V. cholerae susceptibility was associated with mutations in innate immunity genes related to NF-κB signaling (52, 53). Animal models also support a central role for inflammasome activation in response to infection with V. cholerae and other Gram-negative organisms (46, 47). Because the innate immune system provides the first line of host defense against infection, it is possible that other microbial pathogens exerted selective pressure on the same pathways (5457). Association studies for other diseases historically prevalent in the Ganges River Delta, focused on the regions of natural selection that we have identified, would consequently be of interest.

The NF-κB signaling pathway we identified as under selection in the BEB population modulates gut epithelial integrity and the interaction between the mucosal immune system and gut microflora (58). Ablation of IKBKG in the intestinal epithelia of mice causes severe, chronic intestinal inflammation, much like human inflammatory bowel disease (IBD) (59). We noted a significant overlap between genes that are highly selected in the BEB population and loci that are strongly associated (PGWAS < 1 × 10−10) with an increased risk of ulcerative colitis (UC) (Pset = 0.017, Pexp = 0.049, 3 of 17 GWAS loci) (table S11) (60). Thus, pathways under selection in the BEB population, and potentially involved in susceptibility to mucosal infection with V. cholerae, may also be relevant to understanding a common autoimmune disease occurring at the mucosal surface. Notably, epidemiological studies suggest an increased risk for UC and IBD in South Asian populations (61, 62).

A role for ion channels, particularly the cystic fibrosis transmembrane conductance regulator (CFTR), in susceptibility to cholera has long been hypothesized (63, 64). We found no evidence of selection at CFTR (fig. S4). We did find significant enrichment in top selected regions for voltage-gated potassium channel genes; one of these potassium channel genes, KCND2, is about 2.5 Mb downstream of CFTR. We identified a specific association between two other voltage-gated potassium channel genes, KCNH7 and KCNH5, and severe V. cholerae infection. KCNH7 and KCNH5 are in a class of potassium channels that is expressed in excitable cells but is also found in gastrointestinal tract epithelia (65). These channels appear to control cyclic adenosine monophosphate–mediated chloride secretion, the mechanism by which cholera toxin causes secretory diarrhea (21, 22, 6670).

Beyond insights on cholera, our genome-wide scan for selection provides clues to other important traits in the BEB population. For example, we found the Kell blood group genes to be highly selected both in the BEB population and in East Asians. The expression of Kell blood group genes increases in arsenite-treated cells (71). High levels of carcinogenic arsenic groundwater contamination are found in Bangladesh and parts of China (72), suggesting a possible selective pressure for this finding.

Here, we leveraged the shared history between a human population and a powerfully selective pathogen to identify specific host genetic factors influencing disease susceptibility. By identifying areas of overlap between targets of natural selection and genes associated or biologically linked with cholera, we were able to propose a testable, biological hypothesis regarding the interaction between V. cholerae and the inflammasome. We were also able to identify an innate immune response pathway under selection by cholera; this pathway includes genes not found using other methods, such as animal models and human expression studies. Further studies of this pathway may shed light on immune responses to enteric pathogens or the mechanisms of intestinal homeostasis (73). Our approach is applicable to other historically prevalent infectious diseases, such as Lassa fever, tuberculosis, leishmaniasis, and malaria, and to complex, common diseases for which the associated genes may have been historically selected, such as IBD. With the number of publicly accessible genetic data sets for human populations growing rapidly, incorporating tests for natural selection into existing and future GWAS will add a new dimension to our understanding of human genetics and disease.

Materials and Methods

Selection study design

We designed the selection study using the same mother-father-child trio design as the Haplotype Map, which we have previously shown effective for CMS analysis (23, 24). We enrolled parent-child trios of Bengali ethnicity at a field site of the International Centre for Diarrhoeal Disease Research (icddr,b) in Dhaka, Bangladesh. We collected buccal swabs from study participants for extraction of genomic DNA (Qiagen). Informed consent was obtained from participants or adult guardians. The study was approved by the Ethical and Research Review Committees of icddr,b, Partners Healthcare Human Research Committee, and the institutional review board of Broad Institute/MIT. We genotyped 42 trios (126 individuals) on the Illumina 1M-Duo array (1,199,026 SNPs). We excluded male heterozygous genotypes on non-pseudoautosomal X and removed SNPs (n = 79,961) and individuals (n = 3) with call rates <90%, families with >1000 Mendelian errors (n = 3), and SNPs with Hardy-Weinberg equilibrium P < 0.001 (n = 597) or >1 Mendelian errors (n = 4497) (40). We phased and imputed the final data set of 36 trios and 1,112,946 SNPs (average call rate, 99.5%) with BEAGLE 3.2.0 (default parameters) (74) on genome build hg18. We identified regions of natural selection using CMSGW.

Association study design

We tested our top selected variants for association with the phenotype of cholera susceptibility in a targeted association study. We compared individuals with and without a diagnosis of cholera from an observational cohort of index cases (presenting at icddr,b’s Dhaka hospital between January 2004 and November 2005) and family members sharing household and cooking facilities (39). A diagnosis of cholera in the index cases required acute watery diarrhea and a stool culture positive for V. cholerae serogroup O1 and, in the family members, a positive rectal swab culture for V. cholerae O1 or symptoms of diarrhea during 21 days of follow-up. Hospitalized individuals were severe cases. We had 105 cases (69 severe) and 167 controls, including 38 parent-affected offspring trios (16 severe). The study was approved by the Ethical and Research Review Committees of the icddr,b and by the Partners Healthcare Human Research Committee.

We genotyped 517 SNPs tiled across 28 top selected regions, including 9 of the top 10 (excluding one containing only olfactory genes) (table S9), and 19 randomly chosen SNPs in the cases, controls, and the 42 trios from the BEB selection study using the Sequenom MassArray iPlex platform. We excluded SNPs and individuals with missing call rates >25%, yielding 496 SNPs (genotype rate, 98.2%), with 371 polymorphic (minor allele frequency >1%). We estimated allele sharing as IBS distance (number shared/total number of alleles). We tested for association using the TDT and DFAM tests implemented in PLINK (40).

We designed the replication study to validate the top associations by genotyping 57 SNPs in an independent cohort of 124 unrelated, severe cases compared to the 72 unrelated parents from the BEB trios (unphenotyped controls). We removed 2 SNPs with call rates <75%, leaving 44 SNPs from the top 12 candidate regions and 11 randomly selected SNPs (average genotyping rate, 95.9%). We used BEAGLE to impute calls at SNPs in LD with a genotyped SNP (r2 > 0.9), using the Illumina 1M and Sequenom data from the BEB trios as a reference, yielding a final data set of 204 SNPs (74). We tested for association to 49 haplotype blocks of SNPs in strong LD, defined using Haploview’s “Gabriel et al.” method (40, 42).

Population genetics and CMS analysis

Principal components analysis was done with GCTA (75) and maximum likelihood estimation of individual ancestries with ADMIXTURE 1.22 on a thinned marker set with reduced LD (PLINK; 50 SNP windows, r2 < 0.1) (32). We tested for admixture using the three-population test (f3) and estimated the date of admixture using ROLLOFF (28, 30, 31). CMS was run on the BEB and the phased SNP data from the 1000 Genomes Project (Supplementary Methods). We tested for signals of natural selection in the BEB, Yoruba, North/West Europeans, and East Asians by calculating the CMSGW score for SNPs genotyped in all populations (24) (Supplementary Methods). We defined selected regions as >100-kb regions where >30% SNPs have normalized CMS score >3 (0.1% FPR in simulations), merged overlapping regions, and identified genes in regions using the RefSeqGene catalog and the LINC RNA catalog (7678).

Gene set enrichment testing

We tested regions for enriched gene sets using INRICH, calculating significance empirically through 1,000,000 permutations matched for region size, SNP density, and gene number (35). Reference gene sets are detailed in the Supplementary Methods.

Activation of the inflammasome by cholera toxin

Immortalized bone marrow–derived macrophages from wild-type, caspase-1 knockout (Ice−/−), and Asc knockout (Asc−/−) mice were provided by D. Golenbock (University of Massachusetts Medical School). Cells were plated in 96-well flat-bottomed plates at 2 × 105 cells per well in Dulbecco’s modified Eagle’s medium supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin. Cells were primed with LPS (200 ng/ml) for 3 hours and stimulated with cholera toxin (1 μg/ml) for 22 hours. Supernatants were collected, and IL-1β concentration was determined by IL-1β sandwich enzyme-linked immunosorbent assay (ELISA) with the DuoSet ELISA Development Kit (R&D Systems).

Selection on innate immunity pathways related to V. cholerae infection

We developed the model of selection on the human innate immune signaling pathways that respond to V. cholerae infection using Ingenuity’s IPA software (45). We first compiled a core gene set from experimentally derived pathways linked to cholera (46, 47) and genes up- or down-regulated in duodenal biopsies obtained from patients with acute cholera (51). We used IPA to agnostically identify all interactions annotated as direct and experimentally validated between this gene set and two sets of candidate selected genes: (i) those in the top 10 regions of selection in the BEB population and (ii) IKBKG and the selected genes in the MORF_IKBKG gene set. We included one additional gene, CD86, because it is known to respond to cholera toxin B subunit (79) and is one of just two known interactions in the Ingenuity Knowledge Base for MARCH8, the only gene in the fourth highest region of selection in the BEB (CMSGW = 9.02) (80, 81).


For CMSGW, we considered a normalized score above 3 significant, corresponding to a 0.1% FPR in simulations (24). For gene set enrichment with INRICH (35) and trait association (TDT, DFAM, and χ2 tests in PLINK) (40), P < 0.05 (one-sided) was considered significant. See Materials and Methods for details.

Supplementary Materials


Fig. S1. Principal components analysis of Bangladesh, HapMap3, and Singapore populations.

Fig. S2. Comparing ROLLOFF fit of models with one or two admixture events.

Fig. S3. ADMIXTURE modeling with increasing numbers of ancestral populations.

Fig. S4. CMSGW found no selection at CFTR in the BEB.

Table S1. Whole-genome SNP data sets.

Table S2. FST between Bangladesh and HapMap3 populations.

Table S3. Three-population test.

Table S4. Ancestry analysis with ADMIXTURE.

Table S5. Candidate selected regions in BEB population.

Table S6. Gene content of CMSGW regions.

Table S7. INRICH gene set enrichment analysis.

Table S8. Custom gene sets tested with INRICH.

Table S9. Regions included in association study.

Table S10. Association study results.

Table S11. Overlap between IBD GWAS loci and BEB selected regions.

References and Notes

  1. Acknowledgments: We thank P. Moorjani, S. Grossman, P. Lee, D. Reich, S. Purcell, and C. Edwards for invaluable support and critical discussions throughout. Funding: Supported by a Physician-Scientist Early Career Award from the Howard Hughes Medical Institute (R.C.L.), a Claflin Distinguished Scholar Award from the Massachusetts General Hospital (R.C.L.), an International Research Scientist Development Award from the NIH K01TW007409 (J.B.H.), NIH grant AI058935 (S.B.C., E.T.R., and F.Q.), NIH RO1 AI079198 (C.E.B. and L.M.S.), a FIC-NIH training grant D43 TW005572 (A.S.), an American Cancer Society Postdoctoral Fellowship (E.K.K.), the Packard Foundation Fellowship in Science and Engineering (P.C.S. and E.K.K.), and NIH Innovator award 1DP2OD006514-01 (P.C.S. and E.K.K.). Author contributions: R.C.L., P.C.S., J.B.H., F.Q., and E.K.K. conceived the study and designed the experiments. R.C.L., F.Q., J.B.H., E.T.R., S.B.C., A.S., and F.C. collected, prepared, and characterized diagnoses for human samples. E.K.K., S.T., I.S., N.P., C.O., P.C.S., and S.F.S. designed analysis methods and statistical tests. E.K.K., R.C.L., A.R., C.E., S.T., I.S., N.P., and S.G. performed and interpreted population genetic, selection, enrichment, and association analyses. R.C.L., J.B.H., L.M.S., C.E.B., and O.S.S. designed and performed the in vitro experiments. E.K.K., R.C.L., P.C.S., J.B.H., and F.Q. wrote the paper with input from S.F.S. and other authors. Competing interests: The authors declare that they have no competing interests. Data and materials availability: The CMS code is available at
View Abstract

Navigate This Article