Research ArticleHuman Genetics

Aggregate penetrance of genomic variants for actionable disorders in European and African Americans

See allHide authors and affiliations

Science Translational Medicine  09 Nov 2016:
Vol. 8, Issue 364, pp. 364ra151
DOI: 10.1126/scitranslmed.aag2367

The problem of penetrance

It seems obvious that people who have mutations in genes known to cause disease in well-studied families would be more likely to also have the clinical features of disease if they were selected from the general population. But researchers have obtained mixed results on this point because of incomplete penetrance, i.e., not everyone who has a certain disease-causing mutation (a pathogenic variant) has the disease, raising questions about the value of genetic screening of people who are not sick. Natarajan and colleagues bring some clarity to this issue by examining two large groups of subjects—from the Framingham Heart Study and the African-American Jackson Heart Study—for the presence of mutations in 56 disease-related genes and for clinical features of their corresponding diseases. Even though the authors examined the genetic results of almost 5000 people, the number of these mutations was small. Nevertheless, these data clearly show that carrying a pathogenic variant markedly increases the chances of having the related disease.


In populations that have not been selected for family history of disease, it is unclear how commonly pathogenic variants (PVs) in disease-associated genes for rare Mendelian conditions are found and how often they are associated with clinical features of these conditions. We conducted independent, prospective analyses of participants in two community-based epidemiological studies to test the hypothesis that persons carrying PVs in any of 56 genes that lead to 24 dominantly inherited, actionable conditions are more likely to exhibit the clinical features of the corresponding diseases than those without PVs. Among 462 European American Framingham Heart Study (FHS) and 3223 African-American Jackson Heart Study (JHS) participants who were exome-sequenced, we identified and classified 642 and 4429 unique variants, respectively, in these 56 genes while blinded to clinical data. In the same participants, we ascertained related clinical features from the participants’ clinical history of cancer and most recent echocardiograms, electrocardiograms, and lipid measurements, without knowledge of variant classification. PVs were found in 5 FHS (1.1%) and 31 JHS (1.0%) participants. Carriers of PVs were more likely than expected, on the basis of incidence in noncarriers, to have related clinical features in both FHS (80.0% versus 12.4%) and JHS (26.9% versus 5.4%), yielding standardized incidence ratios of 6.4 [95% confidence interval (CI), 1.7 to 16.5; P = 7 × 10−4) in FHS and 4.7 (95% CI, 1.9 to 9.7; P = 3 × 10−4) in JHS. Individuals unselected for family history who carry PVs in 56 genes for actionable conditions have an increased aggregated risk of developing clinical features associated with the corresponding diseases.


Clinical exome and genome sequencing is increasingly applied in the practice of medicine, but many challenges remain (15). There has been extensive discussion of the merits of selection, ascertainment, and reporting of incidental or secondary findings that come to light during sequencing, especially when they may be of medical value to patients and their families (68). In 2013, the American College of Medical Genetics and Genomics (ACMG) recommended that laboratories providing clinical sequencing for any medical indication should search for and report pathogenic variants (PVs) in 56 genes (the ACMG56) that represent 24 rare Mendelian conditions for which there are recommended treatments (7, 9). The ACMG recommendations have generated debate (10, 11), in part because the risk associated with PVs in families with many affected relatives is not always the same for persons whose families are not enriched with affected relatives (1215), and thus it has been unclear whether in the absence of a family history these variants truly represent an increase in risk.

In addition, large-scale biobanks are being sequenced for research purposes, and investigators are struggling with recent recommendations about whether and how to return genomic findings of potential medical importance to participants and their family members (8, 16). Although the genes and variants to be returned are not specified in these recommendations, the ACMG56 have become a convenient starting point for these discussions and for the generation of lists of genes that are actually being reported to the participants. For example, information about the ACMG56, with some modifications, is being returned to the participants by some sites within the eMERGE III (Electronic Medical Records and Genomics Phase III) network (17), as well as the Geisinger MyCode research project (18). These research initiatives presage the use of genome sequencing for population screening and raise the issue of whether this is appropriate (1923).

Estimating the association between specific PVs in individual genes for rare Mendelian conditions and clinical phenotypes in an unselected population is challenging because PVs associated with Mendelian diseases are rare, variants are difficult to categorize with confidence, even among experts (24), and clinical phenotypes unrelated to known genetic changes are relatively common in the general population. Because most of what we know in genetics has been from patients presenting to specialized clinics, and there have been few population-based estimates of variant penetrance, it is surprisingly difficult to demonstrate the seemingly straightforward idea that unselected persons carrying PVs in a group of genes known to be associated with disease are actually at increased risk for those conditions or to estimate the effect size of this increased risk. To address this, we devised an unbiased method to prospectively examine the aggregate association between PVs in a set of genes and clinical features among research participants from the Framingham Heart Study (FHS) and, separately, in the Jackson Heart Study (JHS), all of whom had been exome-sequenced and systematically phenotyped. In these two independent populations, we tested the hypothesis that participants with PVs in any of the ACMG56 genes were more likely to exhibit related clinical features (RCFs) than would be expected in participants without a PV.


Participant description

The FHS participants were drawn from the FHS Offspring cohort [n = 290, 35.7% female; mean age at enrollment, 36.8 (SD, 9.3) years] and the FHS Generation 3 cohort (n = 172, 35.7% female; mean age at enrollment, 44.5 (SD, 8.9) years]. All FHS participants were European American, and comprehensive clinical phenotypes were gleaned from the most recent clinical examination. The participants in the FHS cohort were followed for an average of 20.4 (SD, 14.3) years. Among the 3223 JHS participants, all were African-American, 62.4% were female, and mean age at enrollment was 55.6 (SD, 12.8) years. The JHS participants are being followed longitudinally, but comprehensive clinical phenotypes were only available from the baseline examination.

Overview of study design and phenotype characterization

We designed a procedure for unbiased analysis of the association between exome sequencing and phenotype data from 462 participants in the FHS and 3223 participants in the JHS. Family history was not considered in selecting participants for enrollment in either cohort, nor was it considered in the selection of participants for sequencing. Without knowledge of the phenotypes, we classified all variants in the ACMG56 genes, using a previously described multistep algorithm (25, 26) and following recently revised ACMG variant classification recommendations (27). We tabulated RCFs (Table 1) corresponding to the 24 disease conditions associated with the ACMG56 from clinical records of the FHS and JHS participants while blinded to the results of sequencing.

Table 1. Prespecified clinical features among sequenced participants.

SW, septal width; HR, heart rate; bpm, beats per minute; LV, left ventricle; RV, right ventricle; RA, right atrium; ECG, electrocardiogram.

View this table:

Variant classification

By analyzing exomes, we identified 642 unique variants within the ACMG56 genes in the 462 FHS participants and 4429 unique variants in the 3223 JHS participants, and then subsequently classified these while blinded to the phenotype information (see Materials and Methods). Among FHS participants, we identified five PVs in five individuals (1.1% of the FHS cohort) and two likely PVs (LPVs) in two individuals (0.4% of the FHS cohort). Among JHS participants, we identified 19 PVs in 31 individuals (1.0% of the JHS cohort) and 4 LPVs in 4 individuals (0.1% of the JHS cohort). A description of the variants classified as PVs and LPVs, along with the presence or absence of clinical features suggestive of the corresponding diseases, is shown in Table 2 for the FHS participants and Table 3 for the JHS participants. A listing of the specific transcripts that corresponded to the sequenced genes is shown in table S1, and the evidence from the literature that we used to classify variants into PVs and LPVs from FHS and JHS participants is described in table S2. Variants of uncertain significance (VUSs) in the ACMG56 were identified in 146 FHS participants (31.6%) and 917 JHS participants (28.5%). The ACMG guidelines do not recommend returning VUSs, so these were excluded from further analysis.

Table 2. Individuals with PVs and LPVs in the FHS.

LOF, loss of function; F, female; M, male; PW, posterior wall width; LVD, left ventricular diameter; FS, fractional shortening; BC, breast cancer; OC, ovarian cancer; HCM, hypertrophic cardiomyopathy; HCL, hypercholesterolemia; ARVD/C, arrhythmogenic right ventricular dysplasia/cardiomyopathy.

View this table:
Table 3. Individuals with PVs and LPVs in the JHS.

CRC, colorectal carcinoma; MH, malignant hyperthermia; HPP, hypokalemic periodic paralysis; FDB, familial dysbetalipoproteinemia; HNPCC, hereditary nonpolyposis CRC; LQTS, long QT syndrome; IVS, interventricular septum; PWT, left ventricular posterior wall thickness; LFS, Li-Fraumeni syndrome; LVIDd, left ventricular internal diastolic diameter; NA, not analyzed, that is, individual was not included in the analysis because the expected phenotype was either unavailable (see Materials and Methods) or missing for the given individual.

View this table:

Comparison of observed and expected proportions of phenotypes

To examine our primary hypothesis, we tested whether carriers of PVs in any of the ACMG56 genes were more likely to exhibit corresponding RCFs than would be expected in participants without a PV. We compared the observed number of RCFs in individuals with any PV to the expected number, assuming that the fraction of carriers of particular PVs who exhibited an RCF was equal to the fraction of noncarriers exhibiting those RCFs (see Materials and Methods). Of five FHS participants with PVs, four displayed an RCF, and this proportion (80%) was higher than expected (12.4%; one-sided binomial mixture test, P = 7 × 10−4). The standardized incidence ratio (SIR), which is the ratio of observed RCFs among those with PVs to the number expected on the basis of incidence rates among those without PVs, was 6.4 in the FHS [95% confidence interval (CI), 1.7 to 16.5]. Of 26 JHS participants with PVs, 7 displayed an RCF, and this proportion (26.9%) was also higher than expected (5.4%; P = 3 × 10−4), corresponding to an SIR of 4.7 (95% CI, 1.9 to 9.7). The addition of LPV carriers to this analysis to estimate SIR for PVs and LPVs together yielded similar results [SIR, 4.9; 95% CI, 1.3 to 12.6 in FHS (P = 0.004) and SIR, 4.3; 95% CI, 1.8 to 8.4 in JHS (P = 2 × 10−4)]. Preplanned secondary analyses of individuals with cancer and cardiovascular diseases revealed that the incidence of RCFs was also significantly higher than expected for carriers of PVs associated with cancer and cardiovascular diseases (Table 4).

Table 4. Observed proportion of participants with PV or LPV who had RCFs of the associated condition compared to the expected proportion.
View this table:

Description of individuals carrying PVs

In the FHS, a participant with an LDLR nonsense variant (p.Cys143*) had an untreated low-density lipoprotein (LDL) cholesterol level of 195 mg/dl (optimal <130 mg/dl). Notably, this individual was selected for sequencing in a hypertension study and thus was not selected for sequencing on the basis of her lipid status. A participant with a pathogenic missense variant in MYBPC3 (p.Arg502Trp) had manifestations of dilated cardiomyopathy. Two participants with two different BRCA2 frameshift variants (p.Leu1466Phefs*2 and p.Thr1738Ilefs*2) had (respectively) grade 3 (poorly differentiated), Gleason score 5 prostate cancer diagnosed at age 78 and ductal carcinoma in situ breast cancer at age 55. Neither of the individuals carrying LPVs had RCFs. No PV or LPV carrier was a first-degree relative of another carrier.

In the JHS, there were three individuals who had PVs in cancer predisposition genes who reported a history of cancer, although the type of cancer was not recorded. An individual with a BRCA2 frameshift mutation (p.Val220Ilefs*2) was diagnosed with cancer at age 60. A carrier of MLH1 p.Arg687Trp was diagnosed with cancer at age 36. A carrier of TP53 p.Arg273His, who was enrolled at age 93, reported a diagnosis of cancer at age 89. A carrier of MYH7 p.Ala797Thr had left ventricular hypertrophy with an interventricular septal thickness of 13.2 mm and posterior wall thickness of 12.8 mm (normal, <11 mm). Carriers of KCNQ1 p.Arg518* and KCNQ1 p.Val205Met had corrected QT (QTc) intervals of 477 and 494 ms [normal, <440 ms in men and <460 ms in women]. Furthermore, a carrier of the LPV KCNQ1 p.Gly179Ser had a QTc interval of 506 ms. A participant with LDLR p.Pro685Leu had a markedly elevated untreated LDL cholesterol level (357.5 mg/dl; optimal, <130 mg/dl). One family of six, and two additional pairs of first-degree relatives each harbored the same PV, but none of these 10 individuals displayed an RCF, and thus familial presence of the same variant did not inflate the observed association.


In genetics, penetrance is the proportion of individuals harboring a particular variant who exhibit, or eventually exhibit, the associated disease (28). Estimating the penetrance of PVs in populations that are not enriched for family history is a challenge because specific PVs in any given gene are rare, and therefore an exceedingly large population would need to be systematically examined over many years to ascertain accurate phenotype information, which could emerge at any time in the lifetime of the individual. Our analyses do not address the penetrance of specific variants within individual genes. Instead, we tested whether pathogenic variants in a set of genes are collectively associated with RCFs for those conditions, and, if so, what is the effect size of this aggregated association. To answer this question, we conducted two separate, prospective, hypothesis-driven analyses of 462 European Americans and 3223 African-Americans for a group of 56 genes associated with disease conditions where early intervention could lead to prevention or better outcomes. We found that persons carrying PVs in a subset of these 56 genes demonstrate an increased aggregate risk of having clinical features associated with that gene in both the FHS (an entirely European American population) and JHS (an entirely African-American population). The difference in percentages of the cohort with the phenotypes of interest may be due to an enhanced healthy volunteer effect among the JHS cohort or to the fact that phenotypes were collected prospectively over several examinations spanning the course of decades in FHS but were based on a single examination thus far in JHS.

The frequencies we found for PVs and LPVs in the FHS and JHS populations are similar to recent assessments of PVs in medically actionable genes among large collections of individuals with exome sequences (29, 30) and to others who have reported variants in the ACMG56 among collections of exomes or genomes (31, 32). As in these reports, most of the PVs described here are predicted to encode null alleles and result in haploinsufficiency, a well-defined mechanism of pathogenicity for most genetic diseases. The range of frequencies for PVs in these other studies (from 0.8 to ~5%) could reflect differences in how the various populations were identified and recruited, but more likely reflect variability in variant classification. As we have demonstrated (24), even expert laboratories struggle to achieve complete concordance in variant classification. These issues reinforce the need to apply rigorous standards for variant classification and to share variant classification through mechanisms such as ClinGen (33), and also underscore the methodological importance of blinded variant classification in these analyses.

Understanding the association between PVs and RCFs in the general population is necessary for the informed use of genomics to evaluate patients for secondary findings (sometimes characterized as opportunistic screening) and for the use of sequencing in asymptomatic individuals (population screening) (19, 34, 35), but data to support or refute these practices are scarce. In a separate study of FHS participants, 21% of individuals with PVs in hypertrophic cardiomyopathy genes had clinical features suggestive of cardiomyopathy, a lower proportion than expected in multiplex families but a higher proportion than in persons without such variants (36). Specific founder mutations for long QT syndrome among the Finnish population are far from fully penetrant but are still highly associated with prolongation of QT interval in the relatively homogeneous Finnish population (37). However, analyses of medical records for evidence of cardiac arrhythmias did not demonstrate detectable penetrance of PVs in arrythmia-related genes, perhaps because variant classification was suboptimal (38, 39). Screening for Lynch syndrome has been piloted among incident cases of colorectal cancer (40) but not among cancer-free individuals. A substantially increased risk for breast cancer associated with BRCA variants has recently been demonstrated (41), prompting a call for population-based screening of women around the age of 30 (42). For other genes and variants that are highly penetrant in multiplex families, an increased likelihood of clinical features among carriers cannot always be demonstrated in the general population: Individuals with well-established PVs for mature-onset diabetes of the young in the FHS and JHS do not exhibit an increased likelihood of having type 2 diabetes or impaired fasting glucose (43). Thus, the literature contains mixed results as to whether PVs in some genes, even some of the genes included among the ACMG56, individually confer increased risk of disease in populations that are not selected for family history.

Estimations of gene-disease association are traditionally conceptualized as penetrance on a gene-by-gene and variant-by-variant basis, and predicting the likelihood of a phenotype from a particular variant in a particular gene is difficult when disease prevalence is low and carrier status prevalence is rare. However, there may be value in aggregating PVs across a number of genes to consider the prior probability as a compound hypothesis relating to numerous diseases. For example, among 951 individuals exome-sequenced as part of the ClinSeq cohort, 103 (10.8%) had putative loss-of-function variants in a large number of genes likely to cause a phenotype in heterozygotes (44). In ClinSeq, intensive targeted phenotyping of 79 of these individuals revealed 34 (43%) with personal or family histories that could be attributed to that gene. That analysis deliberately started with the PVs among a population recruited in part for cardiovascular risk and then searched for the corresponding phenotype in that participant or the participant’s family, often uncovering evidence of a previously unrecognized but non–life-threatening genetic condition. In our analyses, we approached both variant classification and the tabulation of RCFs independently and blinded to each other and examined their association in a subset of genes that have been linked to life-threatening conditions in which early intervention or surveillance could potentially mitigate risk.

Our study has several important limitations. These analyses only examined the aggregate association of PVs with RCFs from the corresponding conditions but did not address the penetrance of individual variants or PVs within a specific gene, because this would have required vastly larger sample sizes. Although the FHS and JHS participants were neither enrolled nor sequenced on the basis of family history, the selection of participants for exome sequencing in FHS was based on their involvement in other studies and may therefore not be representative of the entire FHS population. This was not the case in the JHS where all consenting participants with available DNA were exome-sequenced. Our variant classification strategy may have missed some disease-associated variants by dismissing novel missense variants of unknown function from consideration (45). The a priori definition of both observed and expected RCFs in our analysis included any cancer, thus the cancers associated with PVs and the cancers counted in the comparison populations were appropriately included; however, had cancer cases been considered RCFs only when they had an onset early in life, the differences between the observed and expected penetrance of this group of variants might have been different. It is possible that some of the identified PVs occurred in multiplex families, although none of the participants were selected for sequencing based on family history. In JHS, a family of six individuals carried PKP2 p.Arg413*, a PV expected to result in arrhythmogenic right ventricular dysplasia, but none of the family members displayed features of right ventricular abnormalities by echocardiography; therefore, the observed association was not inflated. The number of individuals with LPVs was too small to independently analyze this group, but adding PVs and LPVs together did not change the strength or significance of the association within each population.

These limitations are balanced by a number of strengths. The FHS and JHS cohorts are exceptionally well-studied populations where both sequence data and high-quality clinical data, including electrocardiograms, echocardiograms, and lipid levels, were available for all participants, not just those who had been recognized by the medical care system as patients. Aside from 25 FHS participants who were selected for sequencing on the basis of elevated LDL cholesterol, none of the participants were selected for sequencing on the basis of phenotypes examined in our analysis, and none of those identified in Table 2 with lipid abnormalities were from those 25 individuals. In addition, we prespecified our hypothesis and compared PVs and RCFs that were ascertained and classified independently of each other. Any misclassifications of variants, or censoring of phenotypes due to participant dropout or death, would be expected to bias the results toward the null. Performing these analyses in cohorts where all participants undergo phenotyping is advantageous, but even such systematic testing may incompletely capture some RCFs, such as right ventricular abnormalities on echocardiography for arrhythmogenic right ventricular dysplasia, limiting the ability to detect phenotypes and further biasing toward the null. The relatively few individuals with PVs in the ACMG56 is reflected in a wide CI for the analysis of each cohort; nevertheless, despite the small numbers and limited power, the associations range from a lower bound that is moderately strong to an upper bound that is extremely strong. Although aggregating that the exposure improves power, the combined carrier rate is low in a sample size of 3685, thereby limiting effect estimate precision. However, offsetting this issue is the fact that we independently demonstrated association in two ethnically distinct cohorts with similar relative effect estimates.

The ACMG recommendations for the return of secondary findings were expressly formulated for use in clinical sequencing (7). However, other groups have recommended the return of genomic variants that have medical actionability in research participants who request such information (8, 46), and the ACMG recommendations for clinical sequencing have been suggested as a basis for selecting the appropriate list of genes and category of variant (47). As large-scale, hospital-based, national biobanks begin to generate genomic data, and research initiatives like the Precision Medicine Initiative affirm the right of research participants to have access to their research results (48), guidance regarding the management of such findings is urgently needed. It is important to note that it has not been demonstrated that detecting such variants actually results in improved health outcomes, and to many, the absence of this evidence remains a compelling objection to both opportunistic and population screening. Our results should be replicated in other populations that are followed for clinical outcomes and should be interpreted with caution, but may help inform the emerging debate about whether and how to offer the return of individual genomic results to participants in research cohorts and biobanks, as well as in clinical sequencing.


Study design

We designed and carried out two independent analyses to estimate the association between PVs derived from exome sequencing in any of 56 genes and clinical features related to the actionable Mendelian conditions that have been linked to these genes. We examined all of the participants who had been sequenced at the time of the analysis in FHS and JHS, and used systematically collected phenotype information from each. Variants were classified as described below without knowledge of the clinical phenotypes, and phenotypes were assessed without knowledge of the variants. The association was estimated within each cohort independently, providing replication of the results.


The FHS is a multigenerational, longitudinal study of European Americans established in 1948 in Framingham, MA. Participants in this analysis were from FHS Offspring (children and spouses of the Original cohort) and Generation 3 (children of the Offspring) cohorts (49, 50). Offspring participants were examined every 4 to 8 years, for a total of eight exams. Generation 3 participants were examined twice. The JHS is a prospective, longitudinal study of African-Americans established in 1998 in Jackson, MS. The details of the cohort, including sampling, recruitment, and examinations, have been previously described (5153).

For the FHS, as part of the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP), FHS Offspring and Generation 3 participants were selected for exome sequencing as follows: 41 cases and 135 controls for a study of myocardial infarction, 80 cases and 86 controls for a study of blood pressure, 13 cases and 12 controls for a study of LDL cholesterol, 30 cases with stroke, and 65 FHS participants who were randomly selected.

For the JHS, we analyzed genomic and phenotype data from participants who consented to DNA collection during the first examination (2000 to 2004). Exome sequencing was completed for all consenting JHS participants (3273 of the 5301 participants).

These studies were performed using protocols approved by ethics committees at FHS and JHS and by their institutional review boards, with informed consent from all participants.

Exome sequencing

Exome sequencing, variant detection, and quality control steps for the FHS samples have been previously described (54). Briefly, exome capture used either Agilent SureSelect Human All Exon v2 kit (55), or Roche/NimbleGen SeqCap EZ Human Exome Library v1.0 (~32 Mb; Roche NimbleGen EZ Cap v1) or EZ Cap v2 (~34 Mb). Enriched exome libraries were sequenced on an Illumina GAIIx or HiSeq 2000, aligned to human reference (GRCh37) using BWA (56), followed by duplicate removal, indel realignment, base quality score recalibration, and variant detection using Genome Analysis Toolkit (57).

Variant classification

Variants were adjudicated independently by two evaluators who made their classifications without any knowledge of the phenotype data and any differences resolved by consultation with a third evaluator. Variant classification was completed using a multistep algorithm as described (25, 26, 45) and was consistent with both the ACMG recommendation for secondary findings (7) and the more recently developed ACMG recommendations for variant classification (27).

Transcripts for analysis were those previously selected by the Partners HealthCare Laboratory for Molecular Medicine, a CLIA–certified molecular diagnostic laboratory, and were typically the longest (see table S1). Copy number variants were not evaluated because of the diversity of capture methods and sequencing platforms used for this data set. For variant classification, Alamut (Interactive Biosoftware) (58) and Variant Effect Predictor (59) were used to aggregate variant annotations from multiple sources, including transcript information and evolutionary conservation from the University of California Santa Cruz genome browser (60), and minor allele frequency from the ESP [Exome Variant Server, NHLBI ESP, Seattle, WA (, 15 December 2011] database, 1000 Genomes Project (, and Exome Aggregation Consortium ( browsers. Previously published variants were identified by filtering against the Human Gene Mutation Database (HGMD) Professional (61), GeneInsight (62), and ClinVar (63) databases, the latter two databases were also used in variant classification to obtain additional unpublished data on HGMD-selected variants.

Only missense variants that had previously been reported in an index case, denoted as “disease mutations” in the HGMD nomenclature or classified as pathogenic by at least one clinical laboratory in ClinVar, as well as nonsense, frameshift, and splice variants, were considered. Variants previously reported only in the context of functional or in silico experiments, but not previously associated with a symptomatic individual, were not further considered.

Variants were classified as PV if, in addition to being absent or at a frequency in population databases not inconsistent with their disease penetrance: (i) They were protein-truncating variants (nonsense, frameshift, or ±1,2 splice) in a gene where loss of function is a well-established disease mechanism, and the variant was expected to result in nonsense-mediated decay; (ii) literature review identified significant segregation with disease (defined as ≥10 meioses); or (iii) literature review identified moderate segregation with disease (5 to 9 meioses), and the amino acid was conserved in at least mammals and birds, or the impact of the variant was supported by strong functional data. Variants were classified as an LPV if, in addition to being absent or at a frequency in population databases not inconsistent with their disease penetrance: (i) Literature review showed moderate segregation (5 to 9 meioses) with disease, the amino acid was conserved in all mammals and birds, but functional data were either limited or absent; (ii) literature review identified minimal familial segregation (<5 meioses), but the amino acid was both conserved in all mammals and supported by strong functional data; or (iii) they were protein-truncating variants (nonsense, frameshift, or ±1,2 splice) in a gene where loss-of-function variants have been observed but was not yet a well-established disease mechanism, and the variant was expected to result in nonsense-mediated decay. Variants were classified as benign if the frequency of the variant was above 0.3% for variants associated with dominantly inherited diseases. All other variants were classified as VUS.

For secondary analyses, we grouped the ACMG genes into 23 that are cancer-related (APC, BRCA1, BRCA2, MEN1, MLH1, MSH2, MSH6, MUTYH, NF2, PMS2, PTEN, RB1, RET, SDHD, SDHAF2, SDHB, SDHC, STK11, TP53, TSC1, TSC2, VHL, and WT1) and 31 that are cardiovascular-related (ACTC1, GLA, LMNA, MYBPC3, MYH7, MYL2, MYL3, PRKAG2, TNNT2, TNNI3, TPM1, DSC2, DSG2, DSP, PKP2, TMEM43, KCNH2, KCNQ1, SCN5A, RYR2, ACTA2, COL3A1, FBN1, MYLK, MYH11, SMAD3, TGFBR1, TGFBR2, APOB, LDLR, and PCSK9). Two genes conferring susceptibility to malignant hyperthermia (CACNA1S and RYR1) were not considered in the secondary analyses.

Phenotype data

FHS phenotypes were downloaded from the database of Genotypes and Phenotypes (dbGaP) and were available throughout the period of follow-up, whereas JHS phenotypes were only available from Exam 1 and were extracted from the JHS Vanguard Center package for Exam 1 (53). Sex, age, and date of examination for each subject were derived from data recorded during clinical examinations. RCFs for diseases corresponding with the ACMG genes were ascertained and tabulated without knowledge of the genetic data. For cancer, an aggregated FHS cancer database, with subject diagnoses confirmed from pathology reports and clinical notes, was queried (64, 65), whereas cancer diagnoses in JHS were extracted from Exam 1 participant surveys. For both data sets, any history of cancer was recorded regardless of the age of onset of the cancer. For cardiovascular diseases, the most recent lipid levels, echocardiography, and electrocardiogram data were recorded and categorized according to prespecified criteria (Table 1). In both FHS and JHS, phenotypic data sets were highly complete with less than 4% of participants having missing data for any phenotypic variable.

Statistical analyses

We calculated the expected number of RCFs among those with PVs as ∑iniπi, where ni is the number of individuals with a PV in class i (cancer, hypertrophic and dilated cardiomyopathy, arrhythmogenic right ventricular dysplasia/cardiomyopathy, and dyslipidemia), and πi is the fraction of individuals without PVs exhibiting an RCF in class i. In the FHS cohort, we selected individuals with breast, ovarian, prostate, and gastrointestinal cancer, whereas in the JHS, cancer subtypes were not available, so we used any history of cancer. We estimated statistical significance through simulation: We sampled a binomial random variable with size ni and probability πi for each class i and summed these five random variables (generating a mixture of binomials). We generated 100,000 replicates of this simulated RCF count and estimated the (one-sided) P value as the proportion of replicates where the simulated count was equal to, or exceeded, the observed count. Second, we repeated this procedure for cancer and cardiovascular PVs. We also calculated SIR as the ratio of the observed RCF count to the expected count (66). All statistical analyses were performed with R (version 3.0.2).


Table S1. ACMG incidental findings genes and transcripts analyzed.

Table S2. Classification evidence for PVs and LPVs from FHS and JHS participants.

References (67171)


  1. Acknowledgments: We thank A. Cupples, S. Gray, M. Lebo, and K. Rothman for helpful comments on earlier versions of the manuscript. Funding: This work was supported by NIH grants U01HG006500, U19HD077671, U41HG006834, T32GM007753, R01CA154517, and R01HG06615, and the Howard Hughes Medical Institute. The FHS was supported by contracts N01HC25195 and 6R01NS17950 from NHLBI. The JHS was supported by contracts HHSN268201300046C, HHSN268201300047C, HHSN268201300048C, HHSN268201300049C, and HHSN268201300050C from NHLBI and the National Institute on Minority Health and Health Disparities. Author contributions: All authors participated in the design or interpretation of the reported results, the acquisition of data, and the drafting or revising of the manuscript. Competing interests: R.C.G. has equity in Genome Medical, a company that provides clinical genomics consultation services, and receives compensation for speaking or advisory services to AIA, Helix, Illumina, Invitae, and Prudential. S.K. has been a paid consultant to Regeneron, Celera, Bayer, Catabasis, Merck, Genomics PLC, San Therapeutics, Novartis, Sanofi, Alnylam, Eli Lilly, Leerink Partners, Noble Insights, and AstraZeneca. The remaining authors declare that they have no competing interests. Data and materials availability: The dbGAP accession numbers for the sequences and cardiovascular phenotype data reported in this paper are NHLBI Framingham Cohort (phs000307.v3.p7) and NHLBI JHS (phs000286.v3.p1). All results of secondary data analysis used for this report are available from the authors.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article