Research ArticleGENETIC DISEASE

Predicting functional effects of missense variants in voltage-gated sodium and calcium channels

See allHide authors and affiliations

Science Translational Medicine  12 Aug 2020:
Vol. 12, Issue 556, eaay6848
DOI: 10.1126/scitranslmed.aay6848

Predicting ion channel variant phenotypes

Ion channel variants have been associated with disease, predominantly neurological. Heyne et al. developed a tool to predict the functional effects of variants in disease-associated voltage-gated sodium and calcium ion channels using machine learning–based statistical models. Loss of function versus gain of function (LOF or GOF) was predicted separately from neutrality versus pathogenicity. Their model was trained to classify variant effects using protein sequences and structures containing missense variants with known or highly probable effects and validated against experimentally tested variants and in cohorts including individuals with epilepsy and autism. This work could have implications for ion channel and clinical genetics research.

Abstract

Malfunctions of voltage-gated sodium and calcium channels (encoded by SCNxA and CACNA1x family genes, respectively) have been associated with severe neurologic, psychiatric, cardiac, and other diseases. Altered channel activity is frequently grouped into gain or loss of ion channel function (GOF or LOF, respectively) that often corresponds not only to clinical disease manifestations but also to differences in drug response. Experimental studies of channel function are therefore important, but laborious and usually focus only on a few variants at a time. On the basis of known gene-disease mechanisms of 19 different diseases, we inferred LOF (n = 518) and GOF (n = 309) likely pathogenic variants from the disease phenotypes of variant carriers. By training a machine learning model on sequence- and structure-based features, we predicted LOF or GOF effects [area under the receiver operating characteristics curve (ROC) = 0.85] of likely pathogenic missense variants. Our LOF versus GOF prediction corresponded to molecular LOF versus GOF effects for 87 functionally tested variants in SCN1/2/8A and CACNA1I (ROC = 0.73) and was validated in exome-wide data from 21,703 cases and 128,957 controls. We showed respective regional clustering of inferred LOF and GOF nucleotide variants across the alignment of the entire gene family, suggesting shared pathomechanisms in the SCNxA/CACNA1x family genes.

INTRODUCTION

Voltage-gated sodium (Navs) and calcium channels (Cavs) play a critical role in initiating and propagating action potentials across a broad variety of excitable cells and physiological functions. Upon membrane depolarization, Navs and Cavs are activated and inactivated within milliseconds, leading to a transient influx of sodium or calcium ions into the cell (1). In humans, Navs’ channel proteins are encoded by 10 SCNxA genes (sodium channel protein type x subunit alpha) and Cavs’ channel proteins are encoded by 10 CACNA1x (voltage-dependent calcium channel subunit alpha-1) genes. Variants in genes encoding these channel proteins have been associated with multiple predominantly neurological and neurodevelopmental diseases. These diseases include developmental and epileptic encephalopathy (DEE) (SCN1A, SCN2A, SCN8A, CACNA1E and CACNA1A), episodic ataxia (CACNA1A), migraine (CACNA1A and SCN1A), autism spectrum disorder (ASD) (SCN2A and CACNA1C), and pain disorders (SCN9A, SCN10A, and SCN11A). Disorders affecting cardiac muscle (SCN5A CACNA1C), skeletal muscle (SCN4A and CACNA1S), or the retina (CACNA1F) have also been associated with variants in these gene families (Table 1). Pathogenic variants in these genes often contribute to severe early-onset disorders that are less frequently passed on to the next generation. This selective pressure is captured by the depletion of functional variants in those genes in the general population [median loss of function observed/expected upper bound fraction of 0.29 (2)]. Beyond rare diseases and high-penetrance variants, common variants at CACNA1x or SCNxA loci have also been associated with highly related common disease endpoints. For example, GWAS (genome-wide association studies) have identified genome-wide significant associations at loci including CACNA1C and CACNA1I for schizophrenia (3), SCN1A for epilepsy (4), and SCN10A and SCN5A for atrial fibrillation (5).

Table 1 Disease-associated CACNA1x/SCNxA genes with respective disease mechanisms and supporting literature references.

This table lists references for the associations of CACNA1x/SCNxA genes with diseases and GOF or LOF mechanisms. PTV, protein-truncating variant. ECG, electrocardiogram. NDD, neurodevelopmental disorder. APA, aldosterone-producing adenoma. HypoPP, hypokalemic periodic paralysis.

View this table:

Phylogenetic analyses have found that Navs and Cavs share a common ancestral gene (6) and that they have previously been defined as one gene family (7). Navs and Cavs accordingly share a similar structure composed of four highly similar domains I, II, III, and IV, each consisting of six membrane-spanning segments S1 to S6 (1, 811). These four domains come together, as a pseudo-heterotetramer, to form a functional channel. In the center of the structure is the pore domain that is composed of S5 and S6 segments, surrounded by four voltage sensor domains formed by S1 and S4 segments. The general architecture of Navs’ and Cavs’ voltage-sensing and pore modules is comparable (8), and their function and structure have been extensively studied. Protein domains are associated with specific functions and diseases across channels (8, 12, 13). It has therefore previously been suggested that mutations in similar structural domains entail similar functional outcomes in Navs (1214), Cavs (15), and Navs and Cavs (16). It has also been suggested that pathogenic variants in Navs and Cavs occur preferentially at functionally equivalent amino acids across the gene family alignment of Navs and Cavs (12, 17). Although functional differences between channels exist, particularly among calcium channels (18), with only two amino acid changes, a sodium channel was experimentally turned calcium selective (19), thus illustrating some potential functional homology between Navs and Cavs. In addition, disease-associated missense variants are enriched at amino acid sites that are conserved across paralogs, (20) including in sodium and calcium channels (17, 21). This further supports the hypothesis that similar biophysical pathomechanisms are involved across Navs and Cavs and therefore jointly analyzing them should increase statistical power to identify disease-associated protein features. Disease-associated biological features such as protein structure and conservation metrics have been successfully used in predicting pathogenic versus neutral effects of amino acid changes (22, 23). In voltage-gated potassium channels, pathogenic variant prediction of only one gene, KCNQ1 (24), and the Kv gene family (25) has been conducted with the aim of improving specificity of variant prediction in comparison to genome-wide scores. There have also been attempts to predict functional readouts using electrophysiology data in SCN5A, with limited success potentially because of sparse training data (26).

It has been a standard convention in genetics to differentiate between genetic variants that lead to reduced function and those that lead to altered or increased gene function (27, 28). The labels loss of function (LOF) and gain of function (GOF) are frequently used for this purpose (27, 29). Genetic variants that inactivate protein-coding genes by nonsense-mediated mRNA decay such as stop-gain, essential splice, or frameshift variants have by definition LOF effects. Missense variants, however, can have LOF or GOF effects. These functional alterations can be pathogenic, neutral (if effects are small or can be compensated), or rarely beneficial. Pathogenic missense variants in SCNxA/CACNA1x genes can lead to disease through various changes in channel properties. Such variants can, for example, affect the voltage dependence of steady-state activation or inactivation, the kinetics of the inactivation process or its recovery, ion selectivity, and other metrics that can be recorded in electrophysiological experiments (6). In a simplified disease context, these variants are usually classified molecularly as having either LOF or GOF effects, just as the genetic terminology, depending on whether the net ion flow is increased or decreased. However, a variant may change more than one of the properties described above, with potential opposite functional effects, such as slowing down the inactivation process (causing GOF) and simultaneously lowering protein expression (causing LOF). In such cases, it may be estimated which of the functional alterations dominates to define it either as a net GOF or LOF variant, but this may also be difficult to determine. Clear LOF or GOF effects in different genes are associated with specific channelopathies. For example, in SCN5A, LOF variants can cause Brugada syndrome, whereas GOF variants can lead to long QT syndrome (30). All such known gene-function-disease associations are displayed in Table 1. That a pathogenic variant has a LOF or GOF effect may therefore also be inferred from disease phenotypes. In multiple genes, however, including SCN2A (31) and SCN8A (32), phenotypic differences between LOF and GOF variants are not clear-cut or are not always present at the time of diagnosis. In addition, at scale, functional data are often sparse and this categorization therefore usually relies on genetic classifiers or variants’ downstream effects (29). Therefore, although they are mostly in agreement, at times the genetic LOF/GOF terminology can contrast with actual molecular mechanisms. For example, dominant-negative mutations are thought to lead to LOF effects on the phenotype level but are categorized as GOF mutations on the molecular level (27). A recent example in SCN1A is the variant T226M that results in a biophysical GOF in turn leading to depolarization block in interneurons, a quasi–dominant-negative downstream LOF effect, in agreement with a severe epilepsy phenotype (33).

Knowing an individual variant’s functional effect could potentially improve prognosis, enable precision therapy (3438), and avoid incorrect treatment that could have aggravating consequences [such as treatment with sodium channel blockers in individuals with LOF variants in SCN2A (35) or SCN1A (39)]. However, current variant prediction usually only focuses on whether a variant has a disease-causing or neutral effect. We therefore introduce here machine learning–based statistical models that can classify variants in Navs and Cavs as LOF versus GOF or neutral versus pathogenic, thus providing a valuable resource for clinical genetics, gene discovery, and the experimental ion channel community.

RESULTS

Similar molecular mechanisms in different Navs/Cavs lead to LOF and GOF

Genetic variants in different NaVs/CaVs lead to disease in diverse contexts. Comparing expression data (40) and associations of genes to phenotypes (41), we found that tissue-specific gene expression was correlated with tissue-associated phenotypes (fig. S1). For example, pathogenic variants in SCN5A contributed to heart diseases (Brugada syndrome and long QT syndrome), and the SCN5A-encoded protein Nav1.5 was predominantly expressed in heart tissue. Expression in different tissues and cell types could thus explain the clinically diverse disease spectrum of Navs/Cavs while allowing the possibility that similar alterations to protein structure cause heterogeneous diseases across different channels.

We therefore gathered variants in SCNxA/CACNA1x genes in individuals with disease from different sources (table S1) (35, 4246). We filtered these to 1521 likely pathogenic variants using the variant interpretation guideline by the American College of Medical Genetics and Genomics (47) where possible. Most diseases associated with Navs/Cavs are caused by either GOF or LOF of those ion channels (13, 14, 18, 48). Thus, we inferred whether a likely pathogenic variant has a GOF or LOF effect from disease phenotypes on the basis of known gene-disease mechanisms. For example, in an individual with Brugada syndrome and a pathogenic variant in SCN5A, we assumed that the variant has a LOF effect, because it was previously described that most pathogenic SCN5A variants cause Brugada syndrome via a LOF mechanism (30, 49). We screened the literature for such known gene-disease mechanisms (Table 1). Although we expect agreement in most cases, a LOF/GOF categorization based on the disease outcome can sometimes be different from the variant’s molecular mechanism. An example is the variant p.K1422E in SCN2A (NaV1.2) carried by an individual whose disease phenotype suggests that the variant has a LOF effect. The variant was however previously described as GOF electrophysiologically (50) which we could experimentally replicate. However, we also found that the p.K1422E variant carried a lower current density compared to wild type (fig. S8) potentially explaining a net LOF effect. Applying this knowledge, of the 1521 likely pathogenic variants, we classified 518 variants as likely LOF and 309 variants as likely GOF in 12 different genes across 19 diseases. Eleven diseases had inferred LOF variants and 8 had inferred GOF variants.

We set out to ascertain whether variants with inferred LOF or GOF effects were clustered at corresponding amino acid sites in Navs/Cavs, because this would greatly boost our power to jointly analyze LOF and GOF variants of different Navs/Cavs. To compare variant location of different Navs and Cavs, we mapped variants on a combined gene family alignment of all 20 Nav/Cav sequences. We then correlated variant densities between all 19 diseases (Kendall correlation; Fig. 1A). When variant densities of two diseases are significantly correlated, their variants are clustered at corresponding amino acid sites. We obtained 40 different variant density correlations between diseases. Thirty-seven of the 40 significant (P < 6 × 10−5) correlations involved GOF variants of which 31 occurred between two diseases that were both inferred to be caused by GOF variants. This suggests that GOF variants are clustered at similar amino acid sites in different channels. We performed a principal component analysis to summarize all disease-disease correlations as measured by Kendall’s τ. The first principal component (PC 1) perfectly separated diseases with inferred LOF from those with inferred GOF variants (Fig. 1B). This indicates regional clustering of LOF and GOF variants and thus shared mechanisms lead to LOF or GOF in different ion channels. We hence combined LOF and GOF variants of Navs and Cavs in further analyses.

Fig. 1 Clustering of inferred GOF or LOF variants in different genes.

Inferred GOF and LOF variants were mapped on the gene family alignment of all 10 SCNxA and 10 CACNA1x genes and amino acid sites with gaps in the alignment removed. Variants were counted in a sliding window of three amino acids. (A) Correlation of variant densities using Kendall’s τ. Positive values of τ are blue, and negative values are red. Correlations withstanding Bonferroni correction (P < 6 × 10−5) are marked with **, and correlations with P < 0.01 are marked with *. Genes are sorted by the first principal component of the correlations in (B). (B) Principal component analysis of the correlations (τ). LOF variants are blue, and GOF variants are red. GOF variants in SCN9A are subdivided into the diseases erythromelalgia (“Ery”) and paroxysmal pain syndrome (“pain”). LOF variants in CACNA1A are subdivided into the diseases neurodevelopmental disorder (“NDD”) and episodic ataxia (“EA”).

Machine learning method predicts LOF versus GOF variant effects

We gathered 89 structure-based and sequence-based protein features putatively enriched for LOF versus GOF or neutral versus pathogenic effect variants. Structure-based annotations included protein secondary structure, protein’s accessible surface area, and structural (for example, cytoplasmic or nuclear) and functional (for example, channel pore or selectivity filter) protein domains. Sequence-based features included conservation metrics across the 20 genes, physicochemical amino acid properties, and deleteriousness of amino acid changes like “missense badness” (51). These also included our own conservation score for which we estimated selection pressure on amino acids conditional on ancestry to account for the shared evolutionary history of paralog genes. We tested all binary protein features for an enrichment of inferred LOF (n = 518), GOF (n = 309), pathogenic (n = 1517), or neutral (n = 2328) variants with Fisher’s exact tests (Fig. 2). Six of 9 structure-based features and 3 of 12 sequence-based features were enriched for functional entities such as the pore or selectivity filter for LOF and the S4-S5 linker helix or cytoplasm for GOF variants. In Fig. 3 and fig. S2, variants are mapped on the linear sequence of SCN2A; in fig. S3, they are mapped onto the three-dimensional protein structure of NaV1.2 (SCN2A) (9), and fig. S4 shows quantitative protein features of GOF, LOF, and neutral variants.

Fig. 2 GOF, LOF, and neutral variants are enriched in multiple protein features.

This figure shows which protein features contain significantly more GOF variants than LOF variants (first column) and significantly more pathogenic than neutral variants (second column), for six SCNxA and six CACNA1x genes combined. Associations significant after Bonferroni correction for 2 × 21 tests (P < 0.001) are in orange. We used Fisher’s exact tests to compare variant counts. Point estimates (log10 odds ratios) >0 indicate a protein feature’s enrichment for GOF variants (first column) or pathogenic variants (second column). Features labeled with * are only present in Navs (inactivation gate, DEKA motif of the selectivity filter) or Cavs (gating break). Horizontal bars show the 95% confidence intervals of the odds ratio point estimates that are log10-transformed and cut at −1.7 and 1.7 for clarity. Log10 odds ratios <−1.7 are shown as arrows. AA, amino acid.

Fig. 3 Missense variants in SCNxA (Navs) and CACNA1x genes (Cavs) mapped onto the linear protein structure of SCN2A.

(A) Inferred GOF (red, n = 309) and LOF (blue, n = 518) missense variants. (B) Likely pathogenic variants (orange, n = 1517) and neutral variants (purple, n = 2328). In both panels, upper plots show individual variants and lower plots show variant densities in a sliding window of three amino acids. Navs and Cavs are composed of four similar domains (I, II, III, and IV) that associate to form a channel (linear structure from (31)). In each domain, transmembrane segments S1 to S6 are labeled with 1 to 6. S5 and S6 form the channel pore and S4 contains the voltage sensor that is labeled with “+” to illustrate the positive gating charges. The * at site 1151 refers to a cluster of GOF variants in CACNA1C in individuals with long QT syndrome; ** at site 1882 refers to a cluster of GOF variants in SCN2/8A. Variants with minor allele frequency (MAF) > 10−4 in gnomAD (individuals with neuropsychiatric diseases excluded) (2) were selected as neutral variants. Variants were inferred to be LOF or GOF from disease phenotypes (see Table 1).

We next sought to leverage these associations of protein features with variant effects to train a prediction tool that outputs the probability that a variant results in GOF or LOF. For this, we trained a machine learning model on all 89 protein features of all 518 LOF and 309 GOF variants (table S1). To assess the performance of our model, we set aside a test dataset of 82 randomly chosen variants before the modeling process. We measured performance with the following metrics: balanced accuracy (BA), Cohen’s κ (kappa), Matthews correlation coefficient (MCC), area under the precision-recall curve (prAUC), and receiver operating characteristics (ROC) curve. We aimed to maximize BA during model training. BA, kappa, MCC, ROC, and prAUC are performance metrics aiming to summarize a 2 × 2 contingency table of true-positive/true-negative and false-positive/false-negative predictions with a single number (52). The ROC curve was created by plotting the true-positive rate against the false-positive rate at various probabilities (Fig. 4, A to D). Predicting the test data of 81 variants, our model reached the following performance: BA 0.80, Cohen’s κ (kappa) 0.57, MCC 0.59, ROC 0.85, and prAUC 0.78 (for ROC curves, see Fig. 4, A to D; for performance during training, see fig. S5, B and C). Twenty-five of 37 GOF and 39 of 44 LOF variants were correctly predicted. These results indicate a predictive power comparable to other variant prediction methods (2224). Using this model, we ranked the relative influence of the 89 features on the prediction of LOF versus GOF effects (Fig. 4E). The top two features were GOF variant density features. The 10 most important GOF features also included three different amino acid hydrophobicity scores, three different conservation features, and the Grantham score.

Fig. 4 Variant prediction of LOF and GOF effects in Navs and Cavs.

We trained our statistical model on 746 variants in 12 genes whose functional effects were inferred from disease phenotypes. Here, we show how the model predicts LOF/GOF variant effects in two datasets: 82 disease phenotypes, randomly picked from training data before model training (A and C), and 87 separate functionally tested variants (B and D). (A) Prediction of LOF disease phenotypes, sensitivity = 0.76, specificity = 0.83. (B) Prediction of LOF electrophysiology experiments, sensitivity = 0.74, specificity = 0.72. (C) Prediction of GOF disease phenotypes, sensitivity = 0.83, specificity = 0.76. (D) Prediction of GOF electrophysiology experiments, sensitivity = 0.72, specificity = 0.74. The area under the ROC curve was 0.85 for phenotype-based LOF/GOF prediction and 0.73 for electrophysiology-based LOF/GOF prediction. (E) Feature importance for prediction of GOF versus LOF. The relative influence of features on the prediction normalized to sum to 100 is computed as described in (89). Of 89 features that went into the prediction, only the 18 features that have a relative influence >0.05 on the prediction are shown.

We asked whether modeling variants in Navs + Cavs jointly improved variant prediction over modeling Navs and Cavs separately. When using only the 573 Nav variants during model training, prediction performance in Navs was comparable to model training with all Navs + Cavs variants (BA 0.79, ROC 0.80, and MCC 0.55 versus BA 0.80, ROC 0.85, and MCC 0.58). Predicting Cavs with a model only trained on 171 Cavs, however, gave worse results compared to predicting Cavs with a model trained on Navs + Cavs (BA 0.60, ROC 0.61, and MCC 0.20 versus BA 0.79, ROC 0.88, and MCC 0.58). Performance was better when predicting Cavs with a model just using Navs (BA 0.75, ROC 0.76, and MCC 0.49) compared to just Cavs. These results suggest that the increased power obtained by combining Navs and Cavs outweighs the differences between these channel types.

Machine learning method predicts pathogenic versus neutral variant effects

We set out to predict whether a variant has a “neutral” or a potentially disease-causing (“pathogenic”) effect using the same features, machine learning method, and variants as in the functional variant prediction. We used the 1517 likely pathogenic variants described above including the 825 variants with LOF/GOF annotations and 692 likely pathogenic variants for which we could not annotate with certainty whether they had LOF or GOF effects. We used 2328 variants in the Genome Aggregation Database (gnomAD) (2) in individuals who were ascertained to have no neuropsychiatric phenotypes as putative neutral effect variants (table S2). Before model training, we filtered neutral variants by frequency according to the level of genic constraint to remove rare potentially mildly deleterious variants from the neutral dataset. Similar to our functional variant prediction, we randomly split our dataset before modeling to retain 10% of variants for testing. Predicting the test data of 379 variants with our model, we obtained a BA 0.90, MCC 0.78, ROC 0.95, and prAUC 0.94 (fig. S6A). As further validation, we predicted 89% of additional 1466 variants in genes in gnomAD that were not part of the modeling process as neutral. Together, 1518 of 1711 neutral and 121 of 134 pathogenic variants were correctly predicted. We predicted 466 variants in SCNxA genes with BA 0.86, MCC 0.64, ROC 0.93, and prAUC 0.87 and 1379 variants in CACNA1x genes with BA 0.93, MCC 0.32, ROC 0.97, and prAUC 0.64. The top three features with the largest relative influence on the prediction were part of a paralog-specific conservation metric “parsel” that we developed for this project that estimates selection pressure while accounting for the shared evolutionary history of SCNxA/CACNA1x genes. A further four conservation features were present in the top 10 features (see fig. S6B) in contrast to the LOF versus GOF prediction dominated by variant density features (see Fig. 4E). To test the performance of our model, we compared it to other popular variant pathogenicity prediction methods. To do this, we combined the two test datasets to a total of 1824 variants and removed 21 variants used in the training of PolyPhen-2. Our method performed comparably (ROC 0.95) to the three popular variant prediction tools CADD (23) (ROC 0.79), PolyPhen-2 (22) (ROC 0.85), and MPC (51) (ROC 0.86; see fig. S6B).

LOF versus GOF predicted by funNCion corresponds to LOF versus GOF of functionally tested variants

To compare our disease phenotype–based prediction model to molecular variant effects, we curated 119 functionally tested variants (96 unique variants, some tested in multiple studies) in the genes SCN1A (53), SCN2A (54, 55), and SCN8A (table S3) and performed functional experiments of 50 variants in CACNA1I (table S4). In this and all subsequent validation analyses, we excluded functionally tested variants from the training data before modeling. In the published SCN1/2/8A data, of the 119 variants, 43 were GOF, 51 LOF, 13 mixed, 7 unclear, and 5 neutral. We removed 10 unique variants in individuals with benign familial infantile seizures because several of them had opposite effects in different studies (fig. S7). We then added 12 functionally tested variants in CACNA1I (described below) that were predicted as pathogenic by our method and restricted the data to outcomes of LOF or GOF. Our model predicted the resulting 87 electrophysiological experiments with BA 0.73, ROC 0.73, and MCC 0.45 (Fig. 4, B and D; permutation P < 1 × 10−4).

When subsetting to 57 variants in SCN1/2/8A that either fulfilled our phenotype/pathogenicity criteria of being included in our functional variant prediction training data or were associated with a severe phenotype, our model predicted the variants with BA 0.77, ROC 0.80, and MCC 0.54. All five variants with neutral effects were predicted to be neutral by our pathogenicity prediction method, significantly more than other functionally tested variants (Fisher’s exact test; P = 0.002; odds ratio = Inf; 95% confidence interval 2.4-Inf). Our functional validation data also included 11 SCN2A variants in individuals where age of seizure onset was unavailable or outside of the cutoffs that we used to infer GOF/LOF. We correctly predicted 9/11 of them, emphasizing the benefit of our functional variant prediction when using phenotype as a proxy for variant function is unreliable.

We also tested our prediction on electrophysiology experiments of 50 variants in CACNA1I present in 12,332 individuals with and without psychiatric disease (table S3) (56). The functionally tested variants were present at different population frequencies. However, common variants are unlikely to have strong pathogenic effects—despite considerable efforts in GWAS, exome chip, and exome sequencing, no common strong acting variants have been identified, consistent with the fact that these would not be permitted by the strong selection against schizophrenia (57). Because our LOF-GOF prediction is trained and should hence only be used on likely pathogenic variants, we sought to first predict whether variants were likely pathogenic or neutral using our own method described above. Variants that were more rare were more likely to be predicted pathogenic despite variant frequency not being a component of the model [Spearman rank correlation between minor allele frequency (MAF) in the population cohort gnomAD (2) and pathogenic prediction, ρ = −0.60, P = 3.3 × 10−6; Fig. 5A]. We found that whether a variant was predicted to be pathogenic correlated with whether a variant had a functional effect when considering variants that were present in only one individual (BA 0.75, ROC 0.77, and MCC 0.44). There was, however, no association of pathogenicity and functional effect in 19 variants that were present in >10 individuals in gnomAD (BA 0.46, ROC 0.36, and MCC −0.14). That is consistent with the abovementioned statement that common variants should have no strong pathogenic effects suggesting that functional effects found at higher variant frequencies were likely milder or not disease causing. Given that a variant is pathogenic, we predicted its functional effect (LOF or GOF) with BA 0.83, ROC 0.78, and MCC 0.58 (Fig. 5B and table S4). We then combined the z scores of four electrophysiology parameters to investigate how well we could predict variants with different magnitudes of functional effects. First, pathogenicity probability positively correlated with the combined experimental z score in variants with functional effects (Spearman correlation; P = 0.02; ρ = 0.43; Fig. 5C). In a logistic regression model, we also found that the strength of the functional effect (combined experimental z score) influenced whether funNCion correctly predicted LOF or GOF functional effects (coefficient, 0.29; P = 0.02; Fig. 5D). Accordingly, when only analyzing the 10 variants with a combined z score of four experimental parameters ≥16, we predicted the correct functional effect (LOF or GOF) with BA 0.94, ROC 0.89, and MCC 0.67. Together, these results suggest that our disease phenotype–based model predicting LOF versus GOF effects largely corresponds to LOF versus GOF in functionally tested variants, with an increased performance in variants with larger functional effects and variants that are more likely pathogenic.

Fig. 5 Functional and pathogenicity prediction of 50 experimentally tested variants in CACNA1I.

This figure shows that variants which our method labeled as pathogenic also showed functional effects in experiments and are at low MAF in the general population (gnomAD). Given that a variant is pathogenic, we predicted its functional effect (LOF or GOF) correctly with BA 0.83, ROC 0.78, and MCC 0.58. In (A to C), the y axis indicates a variant’s probability of being pathogenic. Variants with a probability of being pathogenic >0.5 are labeled as pathogenic. In (C) and (D), the sum of four electrophysiology (E.phys.) z scores indicates how much a variant’s electrophysiology differed from wild type (WT). In (A), (C), and (D), we show MAF (log10) in gnomAD on the x axis. In (A) and (B), variants are labeled according to classification in electrophysiology experiment (GOF, red; LOF, blue; neutral, black). In (C) and (D), variants are labeled according to the agreement of functional variant prediction (LOF or GOF) with electrophysiology experiments given that they are functional (correctly predicted, turquoise; wrongly predicted, yellow; neutral variants, black).

Validation of funNCion with large datasets of population controls and neuropsychiatric diseases

We first predicted functional and pathogenicity effects of missense variants in 114,704 individuals without severe pediatric and neurological disorders in gnomAD (2). We set out to test which factors predict a variant’s probability to be pathogenic using linear regression. The most significant predictor was −log10 MAF in gnomAD (P = 2 × 10−74; coefficient, −0.07); in other words, pathogenic variants were at significantly lower frequencies in gnomAD. This is expected, because selection should not allow deleterious variants to rise to high population frequencies (57); see CACNA1I in previous paragraph. We also observed this in individual genes (Bonferroni-corrected P < 0.0025 for eight CACNA1x and five SCNxA genes, P < 0.05 for three genes, and P > 0.05 for SCN7A, SCN10A, SCN11A, and CACNA1F; see Fig. 6A). A positive predictor of variant pathogenicity was a gene’s LOEUF [loss of function observed/expected upper-bound fraction (2); P = 2 × 10−67; coefficient, 0.21]. A low LOEUF value means that the respective gene has significantly fewer protein-truncating variants here labeled “loss of function” variants because they have, by definition, a LOF effect, in gnomAD than expected. The equivalent value for missense variants (here termed “MOEUF”) was also significant (P = 1 × 10−4; coefficient, 0.06). It is again expected that genes that are most intolerant to functional variants would harbor mostly neutral rather than pathogenic missense variants in a cohort of primarily healthy individuals. LOEUF being more strongly associated with pathogenicity than MOEUF suggests that Navs/Cavs may be generally more intolerant to LOF (including LOF missense and truncating) variants than GOF variants. To test this, we ran a linear regression model of GOF probability as a response variable. Overall, pathogenicity probability was positively associated with GOF probability (P = 2 × 10−55; coefficient, 0.37), and LOEUF was negatively associated with GOF probability (P = 5 × 10−6; coefficient, −0.07), whereas MOEUF was slightly positively associated with GOF probability (P = 0.01; coefficient, 0.06). This is in line with the notion that most Navs/Cavs, but in particular those with a lower tolerance for protein-truncating variants, tolerate LOF missense variants less than GOF missense variants. In contrast, genes with particularly low tolerance for missense variants harbored fewer GOF than LOF variants in gnomAD (Fig. 6B). We found an association of pathogenicity and GOF probability in all individual genes (P < 0.0025 corrected for 20 tests) except SCN2A, SCN8A, CACNA1A, CACNA1B, CACNA1C, CACNA1D, and CACNA1E. Fittingly, all of these except CACNA1B are implicated in severe GOF disorders, and SCN8A, SCN2A, CACNA1C, and CACNA1E had the lowest MOEUF values of all Navs/Cavs. Overall, these biologically meaningful results validate our method.

Fig. 6 Predicting GOF, LOF, pathogenic, and neutral variant effects in cohorts of individuals with and without disease.

(A) Correlation of predicted pathogenic variants with MAFs in the gnomAD population cohort (2) without individuals with neuropsychiatric disease (n = 114,704). Variants predicted to be neutral are black, GOF blue, and LOF red. (B) 90% confidence interval (CI) of the observed-over-expected ratio (oe) (2) of missense and truncating variants of SCNxA/CACNA1x genes in gnomAD. oe values of 3000 random genes in gnomAD are plotted in gray. Genes are red if the pathogenicity probability was significantly (Bonferroni P value threshold of 0.0025 to correct for 20 tests) associated with GOF probability, potentially indicating that those genes tolerate LOF missense variants less than GOF missense variants. Genes without that association (P > 0.0025) are in blue. (C) Prediction of pathogenic/functional variant effects in individuals with disease. We compared variant effects of ultrarare missense variants in individuals with and without epilepsy (58), ASD (59), and ADHD (60) (Fisher’s exact test). ID, intellectual disability. GGE, genetic generalized epilepsy. NAFE, nonacquired focal epilepsy. Case cohort sizes are given in the figure. Control cohort sizes are n = 2179 for de novo variants in ASD, n = 5214 for all other ASD/ADHD cohorts, and n = 8436 for epilepsy [of which 6860 were also part of gnomAD (2)]. Nominally significant associations (P < 0.05) are colored in orange (Bonferroni P value threshold of 7 × 10−5). Horizontal bars show 95% confidence intervals of the odds ratio point estimates that are log10-transformed and cut at −1.7 and 1.7 for clarity. Odds ratios >1.7 or <−1.7 are shown as arrows.

To investigate whether our functional variant prediction could replicate known disease associations and mechanisms, we tested the algorithm on large datasets of individuals with and without diseases. We compared numbers of ultrarare missense variants with Fisher’s exact tests between 9170 individuals with and 8436 individuals without epilepsy [of which 6860 were also part of gnomAD (2)] from the Epi25 Collaborative (58), de novo variants in 4186 individuals with and 2179 without ASD from the Autism Sequencing Consortium (ASC) (59); and 8347 individuals with ASD or attention deficit hyperactivity disorder (ADHD) to 5214 controls from the Danish bloodspot cohort (DBS)/iPSYCH consortium (60). We found an enrichment of pathogenic LOF, but not GOF missense variants in genes, where protein-truncating variants are known to cause specific diseases. These included 29 LOF in SCN1A (61) in several nonlesional epilepsies (associated with DEE and febrile seizures with P values < Bonferroni threshold 7 × 10−5) and 14 LOF in SCN2A in 5252 cases of autism with intellectual disability (31) (see Fig. 6C and table S5). CACNA1G, a recent candidate (58) for genetic generalized epilepsy (n = 3108), was also enriched for LOF missense but not GOF variants, and combining 3 LOF missense with 2 protein-truncating variants improved disease association to P = 1 × 10−3. In contrast, only missense variants are associated with DEE in SCN8A and CACNA1E, which were accordingly only enriched for two GOF missense variants in DEE (P = 0.01 and 0.03, respectively). We can also nominate CACNA1B as a potential candidate gene for genetic generalized epilepsy. It was enriched for six missense LOF variants (P = 2 × 10−3) with an overall missense signal of P = 7 × 10−4. Further, biallelic protein-truncating variants in CACNA1B have recently been implicated in a severe epilepsy syndrome (62). It would therefore be plausible that heterozygous LOF in CACNA1B may lead to a milder epilepsy phenotype.

DISCUSSION

Tailoring treatment to individual patients’ genetic variants has made substantial progress in many fields of medicine in recent years (63). Studying ion channel variants’ functional effects in with electrophysiology experiments has enabled development of precision therapies, often accelerated by repurposing existing drugs (48, 64, 65). These functional studies require considerable effort and expertise and therefore usually focus on few variants. In Navs and Cavs, multiple precision medicine approaches have been described (34, 35, 6668); however, their success is dependent on the type of functional changes of pathogenic variants. Here, we present a method that predicts LOF versus GOF effects in likely pathogenic variants in SCNxA and CACNA1x genes—applicable across a range of diverse diseases that can be caused by pathogenic variants in NaVs and CaVs.

In our study, we inferred LOF and GOF effects of genetic variants from disease phenotypes without functionally testing them. This poses several challenges. Phenotypes are ascertainment biased, and there is often variable expressivity of the same variant in multiple individuals. We therefore carefully curated our data to include only clearly distinguishable LOF- or GOF-associated disease phenotypes. Our control dataset was ascertained to include no individuals with severe childhood onset or neuropsychiatric disease. Because these datasets are large, few disease-associated variants may still erroneously be present. Although few variants may still be miscategorized, in silico validation in multiple large population and disease cohorts and performance similar to popular variant prediction tools suggest that our approach is generally meaningful. We further showed that our LOF/GOF prediction largely corresponded to molecular LOF/GOF classes based on electrophysiological experiments. As mentioned previously, there exist a few cases where the molecular LOF/GOF categorization differs by definition from the disease-based LOF/GOF categorization. For example, dominant-negative mutations are molecularly GOF but the downstream effects result in a LOF phenotype (27, 33). Although only experiments can lead to functional insight, any experimental setup constitutes a model system. Hence, a variant may have a functional effect in a laboratory setting that may not always translate to a pathophysiological effect on the organismal level. A model that uses disease phenotypes as a quasi-functional readout considers these additional layers of complexity that in vitro systems are not able to reproduce. This is further illustrated by our data for CACNA1I, where pathogenicity prediction correlated with functional effects only for rare variants. This is expected, because natural selection should prevent deleterious variants from rising in population frequency.

As an example of contradictory phenotypic and functional interpretation, we highlight the selectivity filter domain of the channel proteins. In this region, 42 of 43 likely pathogenic variants were in individuals with LOF phenotypes, including the DEKA motif in Navs that conveys selectivity for Na+ ions (50). However, there are examples of GOF effects in electrophysiology experiments. The p.G1662S variant in SCN10A encoding NaV1.8 was implicated in small fiber neuropathy and showed GOF functionally (69). However, this variant was found at a frequency of 0.0014 in the gnomAD population database, including in four homozygous individuals, and was therefore rated benign by two independent laboratories in ClinVar. Thus, the variant’s functional changes are unlikely to contribute to disease. The second example is the variant p.K1422E in SCN2A carried by an individual with DEE who was 13 months old at the onset of seizures, thus corresponding to a LOF disease phenotype. In previous studies, the variant rendered the channel much elevated permeability to divalent cations like Ba2+ and Ca2+, whereas selectivity for Na+ was significantly reduced (50). We could experimentally replicate that the variant acted as a GOF electrophysiologically in terms of permeability to Ca2+. However, we also found that the Nav1.2 p.K1422E variant carried a lower overall current density compared to Nav1.2 wild type and the Nav1.2 p.T1420M isogenic-variant stable cell line. The current density reduction observed may reflect biological defects in either forward trafficking, reduced single-channel conductance, increased permeability to outward Na+/Ca2+ current, or enhanced endocytosis/degradation. The apparent LOF effect in current density may override the GOF effects in Ca2+ influx, thus explaining the overall LOF disease phenotype. These effects would be difficult to properly evaluate in transient expression systems, illustrating the difficulty of experimentally modeling those complex proteins.

Our approach has several limitations. We acknowledge that the classification of variant effects into LOF and GOF oversimplifies complex electrophysiological mechanisms, even if frequently done in the literature. In SCN9A, for example, two different types of GOF mechanisms impairing channel activation and inactivation have been shown to lead to two different diseases: erythromelalgia and paroxysmal pain syndrome, respectively (70). As mentioned earlier, some variants also have mixed effects (53, 55) that sometimes correspond to clinical phenotypes with overlapping symptoms (30). We have currently too little power to include such variants in our model. With large-scale experimental electrophysiology data, it may be possible to expand or subdivide the GOF and LOF categories or introduce more quantitative GOF/LOF scoring systems for predictions in the future. Further, we analyzed more functional variant and experimental validation data for Navs. Therefore, predictions in Navs should be more reliable than in Cavs. Also, our model was trained on likely pathogenic variants in mostly severe diseases. It remains to be properly validated in individuals with milder diseases with potentially milder variant effects.

Our results gain important insights into which functional protein domains and motifs in Navs and Cavs are associated with inferred GOF or LOF effects in 825 curated likely pathogenic variants. This may provide a valuable resource for experimental follow-up studies to potentially identify mechanistically important sites and drug targets. We can also confirm associations of specific amino acid sites with GOF or LOF effects across diseases with high statistical power that have thus far been shown mechanistically (50, 71, 72) or descriptively (1214), often with less rigorously curated disease variants.

As a positive control, we recapitulated known structure-function associations, such as that pathogenic variants are enriched in transmembrane segments and in functionally important domains like the channel pore or inactivation machinery (8, 12). As mentioned above, LOF variants were clearly associated with the ion conduction structural motifs, including the selectivity filter (50) of the pore domain (S5-S6 segments) as previously hypothesized (14). We confirmed that the structural motifs associated with the inactivation process (8, 71, 73) as well as the S4-S5 linker helix (72) were associated with GOF variants. The S4-S5 linker helix was previously implicated in GOF (14), for example, in pain disorders caused by variants in SCN9A (70) and in DEE caused by variants in CACNA1E (45) and SCN2A (31). We can now confirm a statistically significant association of GOF effects with the S4-S5 linker using data of nine different CACNA1x/SCNxA diseases. Worth noting is a slight extension of the GOF variant cluster beyond the linker helices toward the start of S5 consistently across the four transmembrane domains. Another notable takeaway is that GOF and LOF variants were not equally associated with transmembrane segments S1 to S6 at the four different transmembrane domains I to IV. This corroborates previous findings that different domains in Cavs and Navs have an overall structural similarity but a different contribution to the channel functioning (7476). Last, we observed an accumulation of Nav and Cav GOF variants in the cytoplasmic part downstream of each transmembrane segment S6. Exploring this further may yield mechanical insights.

We highlighted an accumulation of four likely pathogenic variants in CACNA1C encoding Cav1.2 in individuals with long QT syndrome (transcript: ENST00000347598; variants: p.P857L, p.P857R, p.R858H, and p.R860G). Two of these variants were previously functionally investigated. Peak calcium currents were larger in mutant channels than those of wild type for p.R858H (77) and p.P857R (78). One study also identified increased surface membrane expression of the channel compared to wild type (78). The authors found that those variants overlapped with the so-called PEST domain (proline, glutamic acid, serine, and threonine) that is involved in protein degradation signaling and lead to increased numbers of Cav1.2 channels at the cell membrane. This domain as well as the cluster of GOF variants are not present in other Cavs or Navs, pointing to a distinct GOF mechanism in Cav1.2.

We reported a GOF variant cluster of nine likely pathogenic SCNxA variants (genes SCN2A, ~4A, and ~8A). When mapped onto SCN2A, they are located in the C-terminal domain at amino acid sites 1875 to 1887 that is in close proximity to a FGF (fibroblast growth factor)/FHF (FGF homologous factor) binding site of a calmodulin (CaM)–FGF complex also present in Nav1.4 and Nav1.5 (79). FHF1 to FHF4 interact with the C-terminal domain of Navs to modulate the channels’ fast and long-term inactivation (80). One of these variants, p.R1882Q in SCN2A, also showed a slower time course of inactivation (81). Further, de novo GOF variants in FHF1 have been associated with DEE (82) and variants in FHF2 with generalized epilepsy with febrile seizures plus (GEFS+). The C-terminal lobe of the CaM-FGF complex interacts with the conserved IQ motif of helix α-VI of the C terminus of all Nav channels (79), suggesting that it may serve as an anchor for the control of activation of the channels by CaM. In contrast to the FHF binding site, the I of the potential “IQ motif” overlaps with two LOF variants in SCN1A. These observations could yield starting points for hypotheses about this interaction.

We also identified secondary structural protein features associated with LOF and GOF variants. As expected (83), LOF variants are more likely to be buried in the protein where they can potentially disrupt protein stability, so the probability that an amino acid is buried becomes a predictive feature in the machine learning model. Features related to amino acid properties, particularly hydrophobicity and deleteriousness, also contributed to the LOF versus GOF prediction, in agreement with previous studies (13, 33, 84, 85).

There exist generally more LOF than GOF variants. For SCN2A, a recent study estimates the incidence of LOF cases to be about fivefold higher than GOF cases (31). The most important reason for this is likely that GOF can be achieved at fewer sites across the genes than LOF, although other factors like frequency of genetic testing, variant penetrance, and expressivity also play a role. The observation that GOF variants can be more easily identified by their location than LOF variants is also indicated by the fact that the two top predictors of LOF/GOF are GOF variant density features.

We introduced a potentially powerful approach to predict the directionality of likely pathogenic missense variants in SCNxA/CACNA1x genes. In a clinical setting like SCN2A- or SCN8A-related DEE, treatment decisions must often be made before functional studies of disease-causing variants can be done. In the future, our prediction method could be adapted and benchmarked for use in conjunction with best current clinical practices, for example, to predict which individuals with pathogenic variants may be likely to benefit from a particular treatment based on their variants’ LOF or GOF effects. Our method could potentially be refined with large-scale experimental data, for example, by introducing more types of predictions than mere LOF and GOF. Because most SCNxA/CACNA1x genes are depleted for functional variants in the general population, it is likely that more SCNxA/CACNA1x genes could contribute to disease for which disease associations or mechanisms have not yet been elucidated and to which our method could potentially be applied. Last, our study introduces disease phenotype–based functional variant prediction that can also be used in other genes or gene families.

MATERIALS AND METHODS

Study design

In this study, we developed a statistical model that predicts GOF versus LOF effects of genetic variants in NaVs and CaVs (corresponding to SCNxA and CACNA1x genes, respectively). We trained a machine learning model on sequence- and structure-based protein features with (likely) pathogenic missense variants in SCNxA/CACNA1x genes with probable LOF (n = 518) and GOF (n = 309) effects that were inferred from disease phenotypes of variant carriers based on known gene-disease mechanisms. We tested the model on 82 variants randomly chosen from the training data before model training. We then validated the GOF versus LOF prediction on 87 functionally tested variants in SCN1/2/8A and CACNA1I and in exome-wide data from the general population (gnomAD, n = 114,704) (2); individuals with epilepsy (n = 9170) (58), autism (n = 4186) (59), and ASD or ADHD (n = 8347) (60); and 15,829 control individuals of which 6860 overlapped with the 114,704 individuals from gnomAD (2). In all validation steps, testing data were always excluded from training data. Further method details are available in the Supplementary Materials.

Statistical analysis

Statistical analyses were done with the R and the C programming languages. We used the R package caret (86) for most machine learning–related functions and packages ggplot2 (87) and plotROC (88) for plotting. Methods for determining significance of association (Fisher’s exact test, Kendall and Spearman correlation, and logistic and linear regression) and multiple testing correction (Bonferroni) were used as indicated in the manuscript. We used the conventional P value threshold of 0.05 in all analyses with the exception of the analysis “Clustering of inferred GOF or LOF variants in different genes” (see Fig. 1) where we used a more stringent P value threshold of 0.01.

To compare variant location between all Navs and Cavs for clustering of inferred LOF and GOF variants, we mapped the amino acid sites on a combined gene family alignment of all 20 Nav/Cav sequences. We removed alignment gaps obtaining 1268 amino acid sites mappable to all sodium and calcium channels [61% of their canonical isoform length of 2064 amino acids ±222 (means ± SD)]. Seven hundred twenty-six of all 825 LOF/GOF variants could be mapped onto the 1268 family-aligned amino acid sites. We then counted the LOF or GOF variant density on the 1268 family-aligned sites in sliding windows of three amino acids hereby considering LOF or GOF effects of variants’ directly neighboring amino acid sites.

For machine learning–based prediction of GOF versus LOF and pathogenic versus neutral variant effects, we used a table of all 89 protein features by all 825 variants (table S1) to train a prediction tool that outputs the probability that a variant results in GOF or LOF. We used the R package caret’s train function to evaluate, using a 10-fold cross-validation resampling, the effect of model tuning parameters on performance and to choose the optimal parameters for the final model. Further details on statistical modeling and electrophysiology experiments are available in Supplementary Materials and Methods.

SUPPLEMENTARY MATERIALS

stm.sciencemag.org/cgi/content/full/12/556/eaay6848/DC1

Materials and Methods

Fig. S1. SCNxA/CACNA1x gene expression corresponds to tissue-specific disease phenotypes.

Fig. S2. Inferred GOF and LOF missense variants in SCNxA genes (Navs) and CACNA1x genes (Cavs) mapped onto the linear protein structure of SCN2A.

Fig. S3. Pathogenic, neutral, GOF, and LOF variants in SCNxA and CACNA1x genes are mapped onto the protein structure of SCN2A.

Fig. S4. Quantitative protein features of GOF, LOF, and neutral variants.

Fig. S5. Performance of statistical modeling.

Fig. S6. Prediction of pathogenic versus neutral variants.

Fig. S7. Prediction of functionally tested variants.

Fig. S8. Current density comparison of wild-type and mutant NaV1.2.

Table S1. All likely pathogenic variants used in functional and pathogenic variant prediction.

Table S2. All inferred neutral variants used in pathogenic variant prediction.

Table S3. Functional and pathogenic variant prediction of functionally tested variants in SCN1A/ SCN2A/ SCN8A.

Table S4. Functional and pathogenic variant prediction of functionally tested variants in CACNA1I.

Table S5. Prediction of variants in diseases (autism, ADHD, and epilepsy).

Table S6. MAF cutoffs of different diseases.

References (115133)

REFERENCES AND NOTES

Acknowledgments: We thank the members of the Analytic and Translational Genetics Unit and Broad Institute of MIT and Harvard, specifically K. Satterstrom, J. Kosmicki, Y.-C. A. Feng, B. Neale, T. Singh, F. Wagner, and H. Finucane for assistance and helpful comments. We thank U. Hedrich and Y. Liu for helpful discussion on relevant functionally characterized variants and for sharing functional data before publication. We thank J. Krause for critical reading of the manuscript. We gratefully thank principal investigators and members who participated in the DBS/iPSYCH, the Epi25 consortium and the ASC for making these global resources possible and available to us to validate our method. Full author names and funding sources of individual cohorts contributing to ASC or iPSYCH can be found on the ASC flagship paper (59) and ASD/ADHD paper (60). Funding: H.O.H. was supported by stipends from the German Research Foundation (DFG): HE7987/1-1 and HE7987/1-2. U.I.S. was supported by Stiftung Charité. H.L. has been supported by the German Research Foundation (DFG, Research Unit FOR-2715, grants LE1030/15-1 and LE1030/16-1). S.S. was supported by a grant from the Dietmar-Hopp-Stiftung (23011236). P.M. was supported by JPND-Courage-PD, FNR NCER-PD, and FNR FOR-2715 (INTER/DFG/17/11583046) grants. This work was supported by the BMBF Treat-ION grant (01GM1907 to P.M., H.L., and D.L.). The Epi25 project is part of the Centers for Common Disease Genomics (CCDG) program, funded by the National Human Genome Research Institute (NHGRI) and the National Heart, Lung, and Blood Institute (NHLBI). CCDG-funded Epi25 research activities at the Broad Institute, including genomic data generation in the Broad Genomics Platform, are supported by NHGRI grant UM1 HG008895 (M.J.D.). The Genome Sequencing Program efforts were also supported by NHGRI grant 5U01HG009088-02. Additional funding sources and acknowledgment of individual patient and control cohorts are listed in Supplemental Data of the Epi25 flagship paper (58). H.O.H., P.M., J.R.L., R.S.M., H.L., D.L., and M.J.D. are part of Epi25; other authors involved in Epi25 are listed in (58). The Stanley Center for Psychiatric Research at the Broad Institute supported sequencing and control sample aggregation in Epi25. Author contributions: H.O.H. and M.J.D. conceptualized the study. H.O.H., S.I., D.S.P., and M.J.D. developed methodology. H.O.H. curated and analyzed the data. H.O.H. and D.S.P. wrote software. D.B.-N. and H.-R.W. performed functional experiments. H.O.H. wrote the original draft. H.O.H., D.B.-N., D.S.P., S.I., A.B., E.P.-P., U.I.S., H.L., P.M., D.L., A.J.C., J.P., H.-R.W., and M.J.D. reviewed and edited the manuscript. D.B.-N., A.B., Epi25 Collaborative, K.M.J., S.L., J.R.L., R.S.M., E.P.-P., U.I.S., S.S., H.L., P.M., J.P., and H.-R.W. provided resources. S.I., D.S.P., D.L., A.J.C., J.P., H.-R.W., and M.J.D. supervised the study. Competing interests: M.J.D. is a founder of Maze Therapeutics. A.B. is a medical advisor to and has received consulting fees from Encoded Therapeutics. A.B. has received honoraria/speaking fees from the following companies: Encoded Therapeutics, Nutricia, GW Pharma, Zogenix, and Biocodex. S.L. has received travel support from Bial/Eisai and research support from Bial. During revision of the manuscript H.-R.W. transitioned to a position at Neoland Biosciences Co. Ltd., Weihai City, Shandong Province, P.R. China. After provisional acceptance of the manuscript D.S.P. transitioned to a position at Genomics PLC, Cambridge, UK. Data and materials availability: All data associated with this study are present in the paper or the Supplementary Materials. The R code used to perform functional variant prediction, associated data tables, homology models used to compute protein features, PDB files for fig. S3, and functional (LOF versus GOF) and pathogenic variant prediction of all possible single-nucleotide genomic changes that can lead to amino acid changes in Navs and Cavs can be found at https://doi.org/10.5281/zenodo.3529672 and https://github.com/heyhen/funNCion. The C code used to perform ancestry conditional site-specific selection and associated data tables can be found at https://github.com/astheeggeggs/parsel. Functional (LOF versus GOF) and pathogenic variant prediction of all possible amino acid changes in Navs and Cavs are also available at http://funNCion.broadinstitute.org. The ClinVar public repository (42) provides further information for ClinVar (www.ncbi.nlm.nih.gov/clinvar) variant identifiers referenced in table S1. All materials are available for academic use upon request.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article