Quantifying prion disease penetrance using large population control cohorts

See allHide authors and affiliations

Science Translational Medicine  20 Jan 2016:
Vol. 8, Issue 322, pp. 322ra9
DOI: 10.1126/scitranslmed.aad5169

Share trumps rare

No longer just buzz words, “patient empowerment” and “data sharing” are enabling breakthrough research on rare genetic diseases. Although more than 100,000 genetic variants are believed to drive disease in humans, little is known about penetrance—the probability that a mutation will actually cause disease in the carrier. This conundrum persists because small sample sizes breed imperfect alliance estimates between mutations and disease risk. Now, a patient-turned-scientist joined with a large bioinformatics team to analyze vast amounts of shared data—from the Exome Aggregation Consortium and the 23andMe database—to provide insights into genetic-variant penetrance and possible treatment approaches for a rare, fatal genetic prion disease.


More than 100,000 genetic variants are reported to cause Mendelian disease in humans, but the penetrance—the probability that a carrier of the purported disease-causing genotype will indeed develop the disease—is generally unknown. We assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30 times more common in the population than expected on the basis of genetic prion disease prevalence. Although some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1 to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, a finding that supports the safety of therapeutic suppression of prion protein expression.


The study of pedigrees with Mendelian disease has been tremendously successful in identifying variants that contribute to severe inherited disorders (13). Causal variant discovery is enabled by selective ascertainment of affected individuals and especially of multiplex families. Although efficient from a gene discovery perspective, the resulting ascertainment bias confounds efforts to accurately estimate the penetrance—the probability that a carrier of the purported disease-causing genotype will indeed develop the disease—of disease-causing variants, with profound implications for genetic counseling (47). The development of large-scale genotyping and sequencing methods has recently made it tractable to perform unbiased assessments of penetrance in population controls. In several instances, such studies have suggested that previously reported Mendelian variants, as a class, are substantially less penetrant than had been believed (811). To date, however, all of these studies have been limited to relatively prevalent (>0.1%) diseases, and point estimates of the penetrance of individual variants have been limited to large copy number variations (8, 11).

Here, we demonstrate the use of large-scale population data to infer the penetrance of variants in rare, dominant, monogenic disease using the example of prion diseases. These invariably fatal neurodegenerative disorders are caused by misfolding of the prion protein [PrP, the product of the prion protein gene (PRNP)] (12) and have an annual incidence of 1 to 2 cases per 1 million population (13). A small, albeit infamous, minority of cases (<1% in recent years) (14, 15) are acquired through dietary or iatrogenic routes. Most of the cases (~85%) are defined as sporadic, occurring in individuals with two wild-type PRNP alleles and no known environmental exposures. Finally, ~15% of cases occur in individuals with rare, typically heterozygous, coding variants in PRNP, including missense variants, truncating variants, and octapeptide repeat insertions or deletions (table S1). Centralized ascertainment of cases by national surveillance centers (Materials and Methods) makes prion disease a good test case for using reference data sets to assess the penetrance of these variants.

PRNP was conclusively established as a dominant disease gene because a few variants exhibit clear Mendelian segregation with prion disease (1618). Yet, ascertainment bias (19), low rates of predictive genetic testing (20), and frequent lack of family history (21, 22) confound attempts to estimate penetrance by survival analysis (19, 2326). Meanwhile, the existence of nongenetic etiologies leaves doubt as to whether new variants are causal or coincidental.

A fully penetrant disease genotype should be no more common in the population than the disease that it causes. This observation allows us to leverage two large population control data sets to reevaluate the penetrance of reported disease variants in PRNP. The recently reported Exome Aggregation Consortium (ExAC) data set (27) contains variant calls on 60,706 unrelated individuals ascertained for case/control status for various common diseases, without any ascertainment on neurodegenerative disease. 23andMe’s database contains genotypes on 531,575 customers of its direct-to-consumer genotyping service who have opted to participate in the research, pruned to remove related individuals (first cousins or closer, Materials and Methods), preventing enrichment due to large families with prion disease.


Disease prevalence and variant frequency

We began by asking whether reportedly pathogenic variants are as rare as expected in these population control data sets. The proportion of people alive in the population today who harbor completely penetrant variants causal for prion disease can be approximated by the product of three numbers: the annual incidence of prion disease, the proportion of cases with such a genetic variant, and the life expectancy of individuals harboring these variants. On the basis of the upper bounds of these numbers (Fig. 1A), and assuming ascertainment is neutral with respect to neurodegenerative disease, we would expect no more than ~1.7 such individuals in the 60,706 exomes in the ExAC data set (27), and ~15 such individuals among the ~530,000 genotyped 23andMe customers who opted to participate in the research.

Fig. 1. Frequency of reportedly pathogenic PRNP variants: >30 times higher in controls than expected on the basis of disease incidence.

(A and B) Reported prion disease incidence varies with the intensity of surveillance efforts (13), with an apparent upper bound of about two cases per million population per year (Materials and Methods). In our surveillance cohorts, 65% of cases underwent PRNP open reading frame sequencing, with 12% of all cases, or 18% of sequenced cases, having a rare variant (table S1), which is consistent with an oft-cited estimate that 15% of cases of Creutzfeldt-Jakob disease are familial (43). Genetic prion diseases typically strike in midlife, with mean age of onset for different variants ranging from 28 to 77 (table S10) (22, 91); we accepted 80, a typical human life expectancy, as an upper bound for mean age of onset, and to be additionally conservative, we assumed that all individuals in the ExAC and 23andMe data sets were below any age of onset, even though both contain elderly individuals (fig. S1) (92). Thus, no more than ~29 people per million in the general population should harbor high-penetrance prion disease–causing variants; at most ~1.7 people in ExAC (A) and ~15 people in 23andMe would be expected to harbor such variants. Reportedly pathogenic variants were observed in 52 ExAC individuals (B) and on 141 alleles in the 23andMe database (table S5).

Through reviews (2830) and PubMed searches, we identified 63 rare genetic variants reported to cause prion disease (table S2). We reviewed ExAC read-level evidence for every rare (<0.1% allele frequency) variant call in PRNP (Materials and Methods; tables S3 and S4) and found that 52 individuals in ExAC harbor reportedly pathogenic missense variants (Fig. 1B), at least a 30-fold excess over expectation if all such variants were fully penetrant. Similarly, in the 23andMe database, we observed a total of 141 alleles of 16 reportedly pathogenic variants genotyped on their platform (table S5).

Individuals with reportedly pathogenic PRNP variants did not cluster within any cohort within ExAC (table S6), arguing against enrichment resulting from comorbidity with a common disease ascertained for in exome sequencing studies. ExAC does include populations such as South Asians, in which prion disease is not closely surveilled, and we thus cannot rule out a higher incidence than that reported for developed countries, yet the individuals with reportedly pathogenic variants in either ExAC or 23andMe were of diverse inferred ancestry (tables S7 to S9). These individuals’ ages were consistent with the overall ExAC age distribution (fig. S1) rather than being shifted toward younger ages, as would be expected if these individuals were depleted beyond middle age as a result of prion disease onset. We also examined ExAC genotypes at the M129V polymorphism, which affects risk, age of onset, disease duration, and phenotypic presentation in various types of prion disease (3133). Codon 129 genotypes were consistent with population allele frequencies (table S7) rather than enriched for the lower-risk heterozygous genotype. Certain PRNP variants are associated with highly atypical phenotypes (34, 35), which are mistakable for other dementias and might not be well ascertained by current surveillance efforts. Most of the variants found in our population control cohorts, however, have been reported in individuals with a classic, sporadic Creutzfeldt-Jakob disease phenotype (22, 28, 30, 3638), arguing that the discrepancy between observed and expected allele counts does not result primarily from an underappreciated prevalence of atypical prion disease.

Assessing penetrance

Having observed a large excess of reportedly pathogenic variants over expectation in two data sets and having excluded the most obvious confounders, we hypothesized that the unexpectedly high frequency of these variants in controls might arise from benign variants, low-risk variants, or both. We investigated which variants were responsible for the observed excess (Fig. 2). Variants with the strongest previous evidence of pathogenicity were absent from ExAC and cumulatively accounted for five or fewer alleles in 23andMe, consistent with the known rarity of genetic prion disease. Much of the excess allele frequency in population controls resulted instead from variants with very weak previous evidence of pathogenicity (Fig. 2 and Supplementary Discussion). For four variants observed in controls (V180I, R208H, V210I, and M232R), pathogenicity is controversial (39, 40) or reduced penetrance has been suggested (41, 42), but quantitative estimates of penetrance have never been produced, and the variants remain categorized as causes of genetic Creutzfeldt-Jakob disease (21, 22). Although we cannot prove that any of the variants we observe in population controls is completely neutral, the list of reported pathogenic variants likely includes false positives. Indeed, the observation that 0.4% (236 of 60,706) of ExAC individuals harbor a rare (<0.1%) missense variant (table S4) suggests that ~4 of every 1000 sporadic prion disease cases will, by chance, harbor such a variant, which, in many cases, will be interpreted and reported as causal, given the long-standing classification of PRNP as a Mendelian disease gene.

Fig. 2. Reportedly pathogenic PRNP variants: Mendelian, benign, and intermediate variants.

Previous evidence of pathogenicity is extremely strong for four missense variants—P102L, A117V, D178N, and E200K—each of which has been observed to segregate with disease in multiple multigenerational families (1618, 9397) and to cause spontaneous disease in mouse models (98103). These account for >50% of genetic prion disease cases (table S1), yet are absent from ExAC (table S3) and collectively appear on five or fewer alleles in 23andMe’s cohort (table S5), indicating allele frequencies sufficiently low to be consistent with the prevalence of genetic prion disease (Fig. 1). Conversely, the variants most common in controls and rare in cases had categorically weak previous evidence for pathogenicity. R208C (eight alleles in 23andMe) and P39L were observed in patients presenting clinically with other dementias, with prion disease suggested as an alternative diagnosis solely on the basis of finding a novel PRNP variant (104, 105). E196A was originally reported in a single patient, with a sporadic Creutzfeldt-Jakob disease phenotype and no family history (36), and appeared in only 2 of 790 Chinese prion disease patients in a recent case series (106), consistent with the ~0.1% allele frequency among Chinese individuals in ExAC (tables S5 and S8). At least three variants (M232R, V180I, and V210I) occupy a space inconsistent either with neutrality or with complete penetrance (see main text and Fig. 3). R148H, T188R, V203I, R208H, and additional variants are discussed in Supplementary Discussion.

At least three variants (V180I, V210I, and M232R) failed to cluster with either the likely benign or likely Mendelian variants (Fig. 2). Because each of these three appeared primarily in one population (Japanese or Italian) in both cases and controls (tables S1, S5, S7, and S10), we compared allele frequencies in matched population groups. Each had an allele frequency in controls that was too high for a fully penetrant, dominant prion disease–causing variant and, yet, far lower than the corresponding allele frequency in prion disease cases (Fig. 3). Because we lacked genome-wide single-nucleotide polymorphism (SNP) data on cases, we were unable to directly correct for population stratification or substructure, whereby regional differences in allele frequency within Italy or Japan might affect our results. Geographical clusters of genetic prion disease have been recognized for decades (26, 43, 44). For example, nearly half of Italian prion disease cases with the V210I variant are concentrated within two regions of Italy (45), so any nonuniform geographic sampling in cases versus controls would add some uncertainty to our penetrance estimates.

Fig. 3. Variants that confer intermediate amounts of lifetime risk.

M232R, V180I, and V210I showed varying degrees of enrichment in cases over controls, indicating a weak to moderate increase in risk. Best estimates of lifetime risk in heterozygotes (Materials and Methods) range from ~0.08% for M232R to ~7.8% for V210I and correlate with the proportion of patients with a positive family history. Allele frequencies for P102L, A117V, D178N, and E200K were consistent with up to 100% penetrance, with CI including all reported estimates of E200K penetrance based on survival analysis, which range from ~60% to ~90% (19, 2326). Rates of family history of neurodegenerative disease in Japanese cases (table S10) and in European populations (21) are shown with Wilson binomial 95% CIs. *Based on allele counts rounded for privacy (Materials and Methods). Gerstmann-Straussler-Scheinker (GSS) disease associated with variants P102L, A117V, and G131V. Fatal familial insomnia (FFI) associated with a D178N (cis-129M) haplotype.

Nonetheless, the magnitude of the enrichment of certain variants in cases over controls in our data sets makes population stratification an implausible explanation for the entire difference. For V210I to be neutral and, yet, appear with an allele frequency of 8.1% in Italian cases despite an apparent allele frequency of 0.02% in Italian controls, it would need to be fixed in a subpopulation that comprises 8% of Italy’s populace. Under this scenario, the subpopulation would need to be virtually unsampled in any of our control cohorts, and the collection of V210I prion disease cases would be expected to contain many homozygotes. In reality, no cases have been reported as homozygous for this variant. Conversely, if V210I were fully penetrant, then family history would be positive in most cases, and the variant’s appearance on 13 alleles in 23andMe (table S5) would indicate that this variant alone accounts for three times the known prevalence of genetic prion disease (Fig. 1A). Finally, if the low family history rate were caused by many de novo mutations, then V210I cases would be more uniformly distributed across populations (table S1). Similar arguments rule out V180I being either benign or Mendelian. M232R, though clearly not Mendelian, could still be benign because it exhibits only four- to sixfold enrichment in cases, an amount that might conceivably be explained by Japanese population substructure alone. However, because even common variants in PRNP affect prion disease risk with odds ratios of 3 or greater (4648), it is plausible that M232R has a similar effect size; indeed, our data suggest that M232R having this effect on prion disease risk is a more likely scenario than M232R being neutral.

Satisfied that these three variants are likely neither benign nor Mendelian, we estimated lifetime risk in heterozygotes (Materials and Methods). The ~2 in 1 million annual incidence of prion disease translates into a baseline lifetime risk of ~1 in 5000 in the general population (Materials and Methods). Because prion diseases are so rare, even the massive enrichment of heterozygotes in cases (Fig. 3)—implying odds ratios on the order of 10 to 1000—corresponds to only low penetrance, with lifetime risks for M232R, V180I, and V210I estimated to be near 0.1, 1, and 10%, respectively. Although our estimates are imperfect because of population stratification, they accord well with family history rates (Fig. 3) and explain the unique space that these variants occupy in the plot of case versus control allele count (Fig. 2). These data indicate that PRNP missense variants occupy a risk continuum rather than a dichotomy of causal versus benign.

Protein-truncating variants

We asked whether the same was true of protein-truncating variants. PRNP has only one protein-coding exon, so premature stop codons are expected to result in truncated polypeptides rather than in nonsense-mediated decay. Prion diseases are known to arise from a gain of function, because neurodegeneration is not seen in mice, cows, or goats that lack PrP (4952), and the rate of prion disease progression is tightly correlated with PrP expression level (53). Yet, heterozygous C-terminal (residue ≥145) truncating variants are known to cause prion disease, sometimes with peripheral amyloidosis (34). Some of these patients also experience sensorimotor neuropathy phenotypically similar to that present in homozygous, but not heterozygous, PrP knockout mice (54); this phenotype has been attributed to amyloid infiltration of peripheral nerves, rather than loss of PrP function (34).

We identified heterozygous N-terminal (residue ≤131) truncating variants in four ExAC individuals and were able to obtain Sanger validation (fig. S2) and limited phenotype data (table S11) for three. These individuals were free of overt neurological disease at ages 79, 73, and 52, and report no personal or family history of neurodegeneration or peripheral neuropathy. Therefore, the pathogenicity of protein-truncating variants appears to be dictated by position within PrP’s amino acid sequence (Fig. 4). Observing three PRNP nonsense variants in ExAC was consistent with the expected number (~3.9) based on mutation rates once we adjusted our model (55) to exclude codons ≥145, where truncations cause a dominant gain-of-function disease. Thus, we see no evidence that PRNP is constrained against truncation in its N terminus. This lack of any evidence of purifying selection against N-terminal truncating variants, combined with the lack of any obvious phenotype in individuals harboring these variants, suggests that heterozygous loss of PrP function is tolerated.

Fig. 4. Position-dependent effects of truncating variants in the human prion protein.

Truncating variants reported in prion disease cases in the literature (table S2) and in our cohorts (table S1) cluster exclusively in the C-terminal region (residue ≥145), whereas truncating variants in ExAC are more N-terminal (residue ≤131). The ortholog of each residue from 23 to 94 is deleted in at least one prion-susceptible transgenic mouse line (107). C-terminal truncations abolish PrP’s glycosylphosphatidylinositol (GPI) anchor but leave most of the protein intact, a combination that mediates gain of function through mislocalization, which causes this normally cell surface–anchored protein to be secreted. Consistent with this model of pathogenicity, mice that express full-length secreted PrP develop fatal and transmissible prion disease (108, 109). By contrast, the N-terminal truncating variants that we observed retain only residues dispensable for prion propagation and are likely to cause a total loss of protein function.


More than 100,000 genetic variants have been reported to cause Mendelian disease in humans (56, 57). Many such reports do not meet current standards for assertions of pathogenicity (58, 59), and if all such reports were believed, the cumulative frequency of these variants in the population would imply that most people have a genetic disease (27). It is generally unclear how much of the excess burden of purported disease variants in the population results from benign variants falsely associated and how much results from variants with genuine association but incomplete penetrance.

Here, we leveraged newly available large genomic reference data sets to reevaluate reported disease associations in a dominant disease gene, PRNP. We identify some missense variants as likely benign and show that others span a spectrum from <0.1 to ~100% penetrance. Our analyses provide quantitative estimates of lifetime risk for hundreds of asymptomatic individuals who have inherited incompletely penetrant PRNP variants.

Available data sets are only now approaching the size and quality required for such analyses, resulting in limitations for our study. The confidence intervals (CIs) on our lifetime risk estimates span more than an order of magnitude, and our inability to perfectly control for population stratification injects additional uncertainty. We have been unable to reclassify those PRNP variants that are very rare both in cases and in controls (Supplementary Discussion). We have avoided analysis of large insertions that are poorly called with short sequencing reads, although we note that existing literature on these insertions is consistent with a spectrum of penetrance similar to the spectrum that we observe for missense variants (28, 32). Penetrance estimation in Mendelian disease will be improved by the collection of larger case series, particularly with genome-wide SNP data to allow more accurate population matching. This, coupled with continued large-scale population control sequencing and genotyping efforts, should reveal whether the dramatic variation in penetrance that we observe here is a more general feature of dominant disease genes.

Because PrP is required for prion pathogenesis and reduction in gene dosage slows disease progression (53, 6062), several groups have sought to therapeutically reduce PrP expression using RNA interference (6365), antisense oligonucleotides (66), or small molecules (67, 68). Our discovery of heterozygous loss-of-function variants in three healthy older humans provides the first human genetic data regarding the effects of a 50% reduction in gene dosage for PRNP. Both the number of individuals and the depth of available phenotype data are limited, and lifelong heterozygous inactivation of a gene is an imperfect model of the effects of pharmacological depletion of the gene product. With those limitations, our data provide preliminary evidence that a reduction in PRNP dosage, if achievable in patients, is likely to be tolerated. Increasingly large control sequencing data sets will soon enable researchers to test whether the same is true of other genes currently being targeted in substrate-reduction therapeutic approaches for other protein-folding disorders. Together, our findings highlight the value of large reference data sets of human genetic variation for informing both genetic counseling and therapeutic strategy.


Study design

We sought to estimate the penetrance of variants reported to cause genetic prion disease. We reasoned that fully penetrant variants should not be any more common in the general population than genetic prion disease is, and that by comparing allele frequencies in cases versus population controls, we could estimate penetrance for individual variants. This approach does not require controls that are certified to be free of prion disease, but instead only requires that controls not be enriched for prion disease. We carried out a retrospective analysis of existing data from three sources (prion surveillance centers, ExAC, and 23andMe research participants), which are described in detail below.

Prion disease case series

Prion disease is considered a notifiable diagnosis in most developed countries, with mandatory reporting of all suspect cases to a centralized surveillance center. Surveillance was carried out broadly according to established guidelines (69, 70), with specifics as described previously for Australia (71), France (72), Germany (7375), Italy (76), Japan (22), and the Netherlands (77). Sanger sequencing of the PRNP open reading frame was performed as described (78). We included only prion disease cases classified as definite (autopsy-confirmed) or probable (according to published guidelines) (70). Criteria for genetic testing vary between countries and over the years of data collection, with testing offered only on indication of family history in some times and places, and testing of all suspect cases with tissue available in other instances. Summary statistics on the total number and proportion of cases sequenced are presented in table S1.

Exome sequencing and analysis

The ascertainment, sequencing, and joint calling of the ExAC data set have been described previously (27). We extracted all rare (<0.1%) coding variant calls in PRNP with genotype quality (GQ) ≥10, alternate allele depth (AD) ≥3, and alternate allele balance (AB) ≥20%. Read-level evidence was visualized using Integrative Genomics Viewer (IGV) (79) for manual review. Because most ExAC exomes were sequenced with 76–base pair (bp) reads and the PRNP octapeptide repeat region (codons 50 to 90 inclusive) is 123 bp long, it was impossible to determine whether genotype calls in this region were correct, and they were not considered further. After review of IGV screenshots, 87% of genotype calls were judged to be correct and were included in table S3. Of the genotype calls judged to be correct, 99% had GQ ≥95, 99% had AB between 30 and 70%, and 97% had AD ≥10. All participants provided informed consent for exome sequencing and analysis. The ExAC’s aggregation and release of exome data have been approved by the Partners Healthcare Institutional Research Board (2013P001339). ExAC data have been publicly released at, and IGV screenshots of the rare PRNP variants deemed to be genuine and included in this study are available at

23andMe research participants and genotyping

Participants were drawn from the customer base of 23andMe Inc., a personal genetics company (accessed 6 February 2015). All participants provided informed consent under a protocol approved by an external Association for the Accreditation of Human Research Protection Programs–accredited institutional review board, Ethical & Independent Review Services. DNA extraction and genotyping were performed on saliva samples by the National Genetics Institute, a Clinical Laboratory Improvement Amendments–licensed clinical laboratory and a subsidiary of Laboratory Corporation of America. Samples were genotyped on one of four Illumina platforms (V1 to V4) as described previously (80). Of the PRNP SNPs considered, 2 (P105L and E200K) were genotyped on all four platforms, whereas the other 14 were genotyped only on V3 and V4, resulting in differing numbers of total samples genotyped (table S5). Genotypes were called with Illumina GenomeStudio. A 98.5% call rate was required for all samples. As with all 23andMe research participants, individuals whose genotyping analyses failed to reach the desired call rate repeatedly were recontacted to provide additional samples. A maximal set of unrelated individuals was chosen on the basis of segmental identity-by-descent (IBD) estimation (81). Individuals were defined as related if they shared more than 700-centimorgan IBD (about the minimal expected sharing between first cousins). Allele counts between one and five were rounded up to five to protect individual privacy (table S5). Rounding down to one instead would raise our estimates of penetrance for V180I to 7.7% (95% CI, 1.2 to 50%) and for P102L, A117V, D178N, and E200K collectively to 100% (95% CI, 100 to 100%), but the CI would still overlap those based on ExAC allele frequencies, and the overall conclusions of our study would remain unchanged.

23andMe ancestry composition

Ancestral origins of chromosomal segments were assigned on a continental level (European, Latino, African, and East Asian) and on a country level (Japanese) as described by Durand et al. (82). Briefly, after phasing genotypes using an out-of-sample implementation of the Beagle algorithm (83), a string kernel support vector machine classifier assigns tentative ancestry labels to local genomic regions. Then, an autoregressive pair hidden Markov model was used to simultaneously correct phasing errors and produce reconciled local ancestry estimates and confidence scores based on the initial assignment. Finally, isotonic regression models were used to recalibrate the confidence estimates.

Europeans and East Asians were defined as individuals with more than 97% of chromosomal segments predicted as being from the respective ancestries. Because African Americans and Latinos are highly admixed, no single threshold of genome-wide ancestry is sufficient to distinguish them. However, segment length distributions of European, African, and Native American ancestries are different between African Americans and Latinos, because of the distinct admixture timing in the two ethnic groups. Thus, a logistic classifier based on segment length of European, African, and Native American ancestries was used to distinguish between African Americans and Latinos.

At the country level, individuals were classified as Japanese based on the fraction of the respective local ancestry using a threshold of 90% for classifying Japanese ancestry. This threshold is based on the average fraction of local ancestry in the reference population (23andMe research participants with all four grandparents from the reference country): 94% (5% SD, n = 533) for Japanese. Using the same approach, we were unable to obtain a confident set of Italian individuals for analysis of V210I because of extensive admixture. 23andMe research participants with all four grandparents from Italy only have 66% (18% SD, n = 2090) Italian ancestry, and only ~60 participants have >90% Italian ancestry.

ExAC ancestry inference

We computed 10 principal components based on ~5800 common SNPs as described (27, 84). A centroid in eigenvalue-weighted principal component space was generated for each HapMap population based on 1000 Genomes individuals in ExAC. The remaining individuals in ExAC were assigned to the HapMap population with the nearest centroid according to eigenvalue-weighted Euclidean distance. Ancestries of all individuals, including those with reportedly pathogenic variants, are summarized in tables S7 and S8.

Prion disease incidence and baseline risk

The reported incidence of prion disease varies between countries and between years, with much of the variability explained by the intensity of surveillance, as measured by the number of cases referred to national surveillance centers (13). Rates of about one case per million population per year have been reported, for instance, in the United States (85) and in Japan (22); however, the countries with the most intense surveillance (greatest number of referrals per capita), such as France and Austria, observe incidence figures as high as two cases per million population per year (13). Only in small countries where the statistics are dominated by a particular genetic prion disease founder mutation, such as Israel and Slovakia (23, 26), has an incidence higher than two per million been consistently observed (86). We therefore accepted two cases per million as an upper bound for the true incidence of prion disease. Assuming an all-causes death rate of ~10 per 1000 annually (87), this incidence corresponds to prion disease accounting for ~0.02% of all deaths, which we accepted as the baseline disease risk in the general population.

Lifetime risk estimation

By Bayes’ theorem, the probability of disease given a genotype [penetrance or lifetime risk, P(D|G)] is equal to the proportion of individuals with the disease who have the genotype [genotype frequency in cases, P(G|D)] times the prevalence of the disease [baseline lifetime risk in the general population, P(D)], divided by the frequency of the genotype in the general population [here, population control allele frequency, P(G)]. The use of this formula to estimate disease risk dates back at least to Cornfield’s estimation of the probability of lung cancer in smokers (88), with later contributions by Woolf (89) and a synthesis by Li (90) with application to genetics.

We used an allelic rather than a genotypic model, such that lifetime risk in an individual with one allele is equal to case allele frequency (based on the number of prion disease cases that underwent PRNP sequencing) times baseline risk divided by population control allele frequency, P(D|A) = P(A|D) × P(D)/P(A). Note that we assumed that our population control data sets include individuals who will later die of prion disease, thus enabling direct use of the ExAC and 23andMe allele frequencies as the denominator P(A). Following Kirov et al. (11), we computed Wilson 95% CI on the binomial proportions P(A|D) and P(A), and calculated the upper bound of the 95% CI for penetrance using the upper bound on case allele frequency and the lower bound on population control allele frequency, and vice versa for the lower bound on penetrance.

Statistical analysis and source code availability

Error bars in Fig. 3 are as described in the previous section. Data processing, analysis, and figure generation used custom scripts written in Python 2.7.6 and R 3.1.2. These scripts, along with vector graphics of all figures and tab-delimited text versions of all supplementary tables, are available online at and are sufficient to reproduce the figures and the analyses described in this paper.



Table S1. Allele counts of rare PRNP variants in 16,025 definite and probable prion disease cases in nine countries.

Table S2. Rare PRNP variants reported in peer-reviewed literature to cause prion disease.

Table S3. Allele counts of rare PRNP variants in 60,706 individuals in ExAC.

Table S4. Summary of rare PRNP variants by functional class in ExAC.

Table S5. Allele counts of 16 reportedly pathogenic PRNP variants in >500,000 23andMe research participants.

Table S6. Phenotypes investigated in studies in which ExAC individuals with reportedly pathogenic PRNP variants were ascertained.

Table S7. Inferred ancestry and codon 129 genotypes of ExAC individuals with reportedly pathogenic variants.

Table S8. Inferred ancestry of all ExAC individuals.

Table S9. Inferred ancestry of 23andMe research participants.

Table S10. Details of Japanese prion disease cases.

Table S11. Phenotypes of individuals with N-terminal PrP-truncating variants.

Fig. S1. Age of ExAC individuals with reportedly pathogenic PRNP variants versus all individuals in ExAC.

Fig. S2. Sanger sequencing results for individuals with N-terminal–truncating variants.

References (110179)


  1. Acknowledgments: We thank the customers of 23andMe, ExAC research participants, and prion disease patients and families who participated in this research. Funding: Research reported in this publication was partially supported by the National Institute of Diabetes and Digestive and Kidney Diseases and the National Institute of General Medical Sciences of the NIH (awards U54DK105566 and R01GM104371), by Broad Institute NextGen funds, and by Prion Alliance sundry funds. S.M.V. is supported by the National Science Foundation Graduate Research Fellowship Program (grant 2015214731). U.S. prion surveillance work was conducted under Centers for Disease Control and Prevention (contract UR8/CCU515004). Japanese prion surveillance work was supported by a grant-in-aid from the Research Committee of Prion Disease and Slow Virus Infection and the Research Committee of Surveillance and Infection Control of Prion Disease of the Ministry of Health, Labour and Welfare of Japan. The French prion surveillance network is supported by the Institut National de veille Sanitaire. The German prion surveillance work was supported by Robert Koch Institute/Federal Ministry of Health (grant 1369-341). The UK National Creutzfeldt Jakob Disease Research and Surveillance Unit is supported by the Department of Health and the Scottish Executive. The Australian National Creutzfeldt-Jakob Disease Registry is funded by the Commonwealth Department of Health. S.J.C. is supported by a National Health and Medical Research Council Practitioner Fellowship (identification number APP1005816). Contributions at Erasmus Medical Center (MC) were supported by Netherlands Genomics Initiative/Netherlands Organisation for Scientific Research–sponsored Netherlands Consortium for Healthy Aging (project 050-060-810); by the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC; by a Complementation Project of the Biobanking and Biomolecular Research Infrastructure Netherlands (; project number CP2010-41); and by Erasmus MC and Erasmus University, Rotterdam, Netherlands Organisation for Health Research and Development (ZonMw Middelgroot #91111025), Netherlands Organization for the Health Research and Development, the Research Institute for Diseases in the Elderly, the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (Directorate-General for Science, Research and Development), and the Municipality of Rotterdam. Author contributions: E.V.M., S.M.V., and D.G.M. conceived and designed the study. E.V.M. analyzed the data, generated figures, and wrote the manuscript. S.M.V. and E.V.M. reviewed literature and IGV screenshots. K.E.S. performed constraint analyses. M. Lek, K.E., K.E.S., K.J.K., A.H.O.-L., M.J.D., and D.G.M. consulted on data analysis and interpretation. J.F.S., C.Y.M., J.Y.T., and L.P.C.Y. prepared and consulted on analysis of 23andMe data. P.G., J.B., S.Z., Y.C., W.C., M.Y., T.H., N.S., H.M., Y.N., T.K., S.J.C., A.B., R.G.W., R. Knight, C.P., I.Z., T.F.J.K., S.E., A.G., M.C., J.d.P.-C., S.H., J.-L.L., E.B.-A., J.-P.B., S.C., P.P., A.L., A.P., R. Kraaij, J.G.J.v.R., A.R., C.J., S.J.v.d.L., and C.M.v.D. prepared and consulted on analysis of prion surveillance data. E.V.M., J.L.M., M.B., M.Laakso, K.L.M., A.K., K.C., S.M., P.S., P.F.S., C.M.H., S.M.P., C.M.v.D., A.H., M.A.I., S.J.v.d.L., and A.G.U. prepared and consulted on analysis of data regarding protein-truncating variants. ExAC provided exome sequence data. Competing interests: The authors declare that they have no competing interest.Exome Aggregation Consortium collaborators: Monkol Lek, Konrad J. Karczewski, Eric V. Minikel, Kaitlin E Samocha, Eric Banks, Timothy Fennell, Anne H. O’Donnell-Luria, James S. Ware, Andrew J. Hill, Beryl B. Cummings, Taru Tukiainen, Daniel P. Birnbaum, Jack A. Kosmicki, Laramie Duncan, Karol Estrada, Fengmei Zhao, James Zou, Emma Pierce-Hoffman, Mark DePristo, Ron Do, Jason Flannick, Menachem Fromer, Laura Gauthier, Jackie Goldstein, Namrata Gupta, Daniel Howrigan, Adam Kiezun, Mitja I Kurki, Ami Levy Moonshine, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso, Ryan Poplin, Manuel A Rivas, Valentin Ruano-Rubio, Douglas M. Ruderfer, Khalid Shakir, Christine Stevens, Brett P. Thomas, Grace Tiao, Maria T. Tusie-Luna, Ben Weisburd, Hong-Hee Won, Dongmei Yu, Stacey Donnelly, Andrea Saltzman, David M. Altshuler, Diego Ardissino, Michael Boehnke, John Danesh, Roberto Elosua, Jose C. Florez, Stacey B Gabriel, Gad Getz, Christina M. Hultman, Sekar Kathiresan, Markku Laakso, Steven McCarroll, Mark I. McCarthy, Dermot McGovern, Ruth McPherson, Benjamin M. Neale, Aarno Palotie, Shaun M. Purcell, Danish Saleheen, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan, Jaakko Tuomilehto, Hugh C. Watkins, James G. Wilson, Mark J. Daly, Daniel G. MacArthur.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article