Research ArticleEpigenetics

Personalized Epigenomic Signatures That Are Stable Over Time and Covary with Body Mass Index

See allHide authors and affiliations

Science Translational Medicine  15 Sep 2010:
Vol. 2, Issue 49, pp. 49ra67
DOI: 10.1126/scitranslmed.3001262


The epigenome consists of non–sequence-based modifications, such as DNA methylation, that are heritable during cell division and that may affect normal phenotypes and predisposition to disease. Here, we have performed an unbiased genome-scale analysis of ~4 million CpG sites in 74 individuals with comprehensive array-based relative methylation (CHARM) analysis. We found 227 regions that showed extreme interindividual variability [variably methylated regions (VMRs)] across the genome, which are enriched for developmental genes based on Gene Ontology analysis. Furthermore, half of these VMRs were stable within individuals over an average of 11 years, and these VMRs defined a personalized epigenomic signature. Four of these VMRs showed covariation with body mass index consistently at two study visits and were located in or near genes previously implicated in regulating body weight or diabetes. This work suggests an epigenetic strategy for identifying patients at risk of common disease.


A role for epigenetics in common disease has long been suspected (1), and a strong relationship with cancer has been shown (24). We have argued that common disease involves both genetic and epigenetic factors and that epigenetic modification could mark environmental effects and mediate genetic effects (1). In addition to particular exposure-epigenetic relationships (5, 6), epigenetic changes with aging support the notion that there is an environmental component to epigenetic variation (710). Studies of identical twins show greater differences in global DNA methylation in older than in younger twins, consistent with an age-dependent progression of epigenetic change (8, 10). Global methylation changes over an 11-year span in participants of an Icelandic cohort, and age- and tissue-related alterations in some CpG islands from an array of 1413 arbitrarily chosen CpG sites near gene promoters, further corroborate the evidence for dynamic methylation patterns over time (9, 11). Other work, however, has suggested that epigenetic marks, or their maintenance, are themselves controlled by genes and are thus heritable in the traditional sense and associated with particular DNA variants (12). This would predict that methylation marks are stable rather than varying as controlled by changing environments.

Here we describe a genome-scale, gene-specific analysis of DNA methylation in the same individuals over time to identify a personalized epigenomic signature that may correlate with common genetic disease.


We report comprehensive high-throughput array-based relative methylation (CHARM) analyses on samples of the Age, Gene/Environment Susceptibility (AGES) study, assessing 4.5 million CpG sites genome-wide, which has been shown to identify differential DNA methylation without assumptions regarding where such changes would be, and uses arrays tiled through regions based on their relative CpG content, including all CpG islands, as well as CpG island “shores” that have been shown to be enriched in differential methylation (13).

In brief, the AGES study constitutes visit 7 (from 2002 to 2005) of the Reykjavik Study, which began with 18,000 residents of Reykjavik recruited in 1967. The AGES study recruited 5758 of the surviving members, between 69 and 96 years of age in 2002. Of these, 638 gave a DNA sample in 1991 as part of the Reykjavik Study visit 6 and therefore had DNA from two time points, about 11 years apart, available for methylation analysis. We present data for 74 samples, a random set of those who had ample DNA remaining for both study visits. Descriptive statistics for these samples are given in Table 1.

Table 1

Descriptive information (mean ± SE) for samples used in CHARM analyses at each time point.

View this table:

CHARM analysis of samples obtained from visit 7 identified 227 regions that met our criteria for polymorphic methylation patterns across individuals [variably methylated regions (VMRs)]. These represented regions of extreme variability across individuals defined by 10 or more consecutive probes with an average SD of >0.125 (table S1). These VMRs showed enrichment for development and morphogenesis categories (Table 2), including genes from all four HOX clusters. The appearance of developmental genes is predicted by our model that epigenetic variation would involve developmental genes, and this variability itself increases evolutionary fitness in an environmentally changing world (14).

Table 2

Gene Ontology analysis of the 227 VMRs revealed enrichment for the development and morphogenesis categories (P < 0.01).

View this table:

Next, to determine whether methylation at these regions changed within individuals over time, we analyzed the distribution of the absolute value of average within-person change in methylation over time per VMR and found two underlying distributions (Fig. 1). These data fit a two-component mixture model, with 41 VMRs easily classified into the higher intraindividual difference group (probability of membership in orange distribution, >0.99; Fig. 1), defined as dynamic VMRs; 119 VMRs easily classified into the lower distribution (probability of green distribution, >0.99), defined as stable VMRs; and 67 residing in the overlapping region labeled ambiguous, with respect to intraindividual change over time. Thus, about half of the regions that are variably methylated across individuals appear to be stable over time within individuals.

Fig. 1

Distribution of intraindividual change in methylation over time at VMRs. Mixture distribution analysis shows that Dk, the average absolute value of intraindividual differences in methylation over time for VMR k, fits two underlying curves. Green, stable showing little change; orange, dynamic showing larger changes. Ambiguous is intermediate in Dk.

Clustering of the 227 VMR methylation profiles (Fig. 2A) revealed mixing of methylation profiles among the individuals, whereas use of only stable VMRs in the clustering algorithm uniquely identified each individual (Fig. 2B). These stable VMRs may represent polymorphic methylated regions that are not particularly susceptible to exposure modifications or that do not naturally change with age.

Fig. 2

Similarity between individuals based on VMR methylation. (A) Dendogram based on clustering applied to methylation profiles at all 227 VMRs. (B) Dendogram based on clustering applied to methylation profiles using only the 119 stable VMRs. Numbers represent individual identification numbers.

To explore how methylation of particular VMRs may play a role in disease risk, we determined the relationship between methylation and body mass index (BMI), an accessible and treatable phenotype that is known to have many disease correlates (15, 16). We identified 13 VMRs that met a false discovery rate (FDR) criterion of <25% in cross-sectional analyses of visit 7 (Table 3). Of these, four had a P value of <0.10 and the same strength and direction of correlation with BMI at the earlier visit 6 (Three of the 13 had conflicting direction of associations between visits.). These VMRs are in or near genes PM20D1, MMP9, PRKG1, and RFC5. The methylation curves among obese (BMI ≥30) and normal (BMI <25) subjects for the VMR at PM20D1 illustrate an ~20% increase in methylation that persists over time between the two visits (Fig. 3). Scatter plots for the relationship between methylation and BMI for all four VMRs exhibited significant correlations at both visits (Fig. 4).

Table 3

Stable VMRs associated with BMI. Bold values indicate confirmation in visit 6 analysis (P < 0.1 and consistent regression parameter estimates).

View this table:
Fig. 3

Methylation curves at the gene PM20D1. Methylation curves for visit 7 and visit 6 data. Dashed lines are individual methylation curves. Solid lines are average curves by obese (blue) and normal (red) groups. The green line indicates the boundaries of the VMR. CpG density is shown in the third panel, with CpG islands marked in orange. Gene location is shown in the bottom panel.

Fig. 4

Correlations between methylation and BMI at six BMI-related VMRs. Points are individual IDs. Blue indicates visit 7; red indicates visit 6.


We previously showed that global DNA methylation changes within individuals over time (11), and we have now confirmed and identified the locations of site-specific changes at dynamic VMRs with a genome-wide approach. In addition, we identified a separate set of stable VMRs that can be used to uniquely identify individuals in an epigenetic signature akin to genetic fingerprinting. This signature may be correlated with disease status, implying that an epigenetic signature can mark disease risk or disease states. In particular, we show stable VMRs that correlate with BMI at two separate visits a decade apart.

Some have argued that DNA methylation changes over time and is an important biological mediator of environmental effects on human disease (17), whereas others support the concept of inherited DNA methylation patterns, implying that they are potentially variable across individuals but less likely to be dynamic over time (18). This has been a conundrum because these appear to be opposing ideas. However, we show that both ideas have merit. It is important to identify these regions in the context of disease consequences because those that are particularly labile may be the sites relevant when considering epigenetic marks as mediators of environmental effects, whereas those that are stable may be relevant as mediators or moderators of genetic effects. Further, those regions that do not change over time can be used as an epigenetic signature for an individual, similar to genotype. These regions can then be considered as candidates for assessment of methylation associations with disease or health-related phenotypes under specific risk models.

Our results help to focus the integration of methylation measurement into epidemiologic studies of disease risk by providing specific genomic sites for inquiry. Our exploration of possible correlations between methylation at these VMRs and an easily measured disease-related phenotype, BMI, identified 13 genes, 4 of which were consistently correlated with BMI across two separate study visits. Many of these 13 genes have been previously implicated in obesity or diabetes. MMP9, as well as another member of this family, MMP3, encodes a metallopeptidase that is up-regulated in obese individuals (19). Several matrix metalloproteinases (MMPs), including MMP9, are up-regulated in human adipocytes (20). Matrix metallopeptidases have also been associated with obesity in rodent models (21, 22). PM20D1 is also a metalloproteinase and, although not yet well-characterized, may have similar implications for obesity. PRKG1, a guanosine 3′,5′-monophosphate (cGMP)–dependent protein kinase, plays an important role in foraging behavior, food acquisition, and energy balance (23). RFC5 is an intriguing gene because it encodes a metabolism-linked DNA replication complex loading protein, dysfunction of which leads to DNA repair defects. It might thus play a role in well-known but poorly understood DNA damage–related complications of diabetes.

In a mouse model of obesity, SORCS1 has been located at a type 2 diabetes quantitative trait locus (QTL) (24), and this finding has been confirmed in humans, where SORCS1 single-nucleotide polymorphisms (SNPs) and haplotypes were associated with fasting insulin secretion (25). IL1RAPL2 is located at a region on chromosome X that is associated with Prader-Willi–like syndrome, whereas DACH2 is also an X-linked gene associated with Wilson-Turner syndrome, both of which are Mendelian disorders with obesity features. TTC13 is part of a family containing another tetratricopeptide repeat gene, TTC8, which has been directly linked to Bardet-Biedl syndrome and includes obesity as a primary feature. APCDD1 is a positional candidate gene associated with a QTL that affects fat deposition in pigs (26) and is located at a region on chromosome 18 that is linked to percentage of body fat in men (27).

Our identification of VMRs is limited by the number of individuals contributing to this genome-wide CHARM analysis. It is likely that increased sample sizes will improve detection of additional VMRs. Further, the dynamic VMRs defined here are based on an 11-year window among elderly participants. It is important to also identify methylomic regions that show intraindividual changes during early segments of the life span and to connect these changes to particular environmental exposures. One potential caveat from these analyses is that the methylation patterns were obtained from DNA derived from blood and thus contain a mixture of cell types that could confound our results. However, in our previous study of global DNA methylation (that is, non–site-specific) in these samples, we found no relationship between lymphocyte count and methylation (11). A recent paper also showed that cellular heterogeneity was not associated with DNA methylation amounts for most sites they studied (18). Our use of blood as a DNA source may also limit the interpretations of these results, given the tissue specificity of DNA methylation. However, there is a growing precedent for lymphoid tissues serving as a good surrogate tissue for changes in other target tissues. For example, loss of imprinting of insulin-like growth factor 2 (IGF-2), one of the best-studied disease-related epigenetic mutations, is found in both lymphocytes and colon, and changes in either are associated with increased colorectal cancer risk (28). Finally, our exploration of the correlation between BMI and methylation was based on availability of quantitative data and relevance to human disease. We were unable to assess the relationship of VMRs to categorical outcomes in this sample. Although more comprehensive than previous genome-wide, site-specific methylation studies (7, 9), our analysis was limited by the sample number to the relationship between methylation and quantitative phenotype rather than categorical outcomes. This study supports further examination of other measures of obesity and disease outcomes such as diabetes and cardiovascular disease with respect to the particular VMRs identified here.

The implications of these results are wide-ranging. An individual epigenetic signature that is stable over time has not previously been described. Such a signature could be driven by underlying sequence variation, early environmental exposure (for example, prenatally), or both. These stable VMRs would likely complement genotype because they would also reflect early exposure. In addition, we have proposed that some genetic variants would drive increasing site-specific stochastic epigenetic variation, and thus, the variance of methylation in a population could be predicted by genotype but the methylation level in an individual would not be predictable from genotype and would require direct measurement (14).

Whether in part or completely genetically driven, this epigenotype may be more proximate to the ultimate phenotype, in this case BMI, and thus have considerable value for disease risk assessment. Although the sample size is larger than previous genome-scale, gene-specific methylation studies, it is still relatively small compared to classical sequence-driven approaches such as genome-wide association studies (GWAS). Even so, the data suggest that this epigenomic approach to disease phenotype will be an important complement to such studies. Even with the restraint of relatively small numbers of samples, we could identify four genes with VMRs related to BMI. In addition, the identification of stable VMRs may have long-term consequences for developing personalized epigenomics in medicine, with the hope of forging a connection between personal genomes and early (for example, in utero) environmental influences.

Materials and Methods


Nonimmortalized lymphocyte samples were taken from participants of the AGES Reykjavik Study (29). Seventy-four samples contributed to these analyses. These samples meet our high-quality array data criteria and were from a randomly chosen set of 100 samples from the 638 AGES participants that had ample DNA from two visits. CHARM data were only considered in analyses if they passed our internal quality assessment. For cross-sectional analyses of the most recent collection (visit 7), 64 samples contributed data, whereas 48 samples contributed to cross-sectional analyses of the earlier visit 6 data. For identification of dynamic VMRs, a subset of 38 samples had quality CHARM data at both time points. For the analyses with BMI presented here, BMI was calculated as the body weight in kilograms divided by the height in meters squared.

Genome-wide methylation assay

We used CHARM analysis, which is a microarray-based method agnostic to preconceptions about methylation, including location relative to genes and CpG content (30, 31). The resulting quantitative measurements of methylation, denoted with M, are log ratios of intensities from total (Cy3) and McrBC-fractionated (Cy5) DNA: Positive and negative M values are quantitatively associated with methylated and unmethylated sites, respectively. For each sample, we analyzed ~4.5 million CpG sites across the genome with a custom-designed NimbleGen HD2 microarray, including all of the classically defined CpG islands and nonrepetitive progressively lower CpG density genomic regions of the genome until the array is saturated. We include 4500 control probes to standardize these M values so that unmethylated regions are associated, on average, with values of 0. CHARM is 100% specific at 90% sensitivity for known methylation marks identified by other methods (for example, in promoters) while including more than half of the genomes not identified by conventional region before selection. The CHARM results have also been extensively corroborated by quantitative bisulfite pyrosequencing analysis (30).

Identification of VMRs

We first screened the methylome for regions where methylation varied substantially across individuals. We termed these VMRs to distinguish them from regions identified for their discrimination of groups, such as tissue types or cases versus controls, which we and others have previously called differentially methylated regions (DMRs). Our use of the term VMR can be considered a specific type of metastable epiallele introduced by Rakyan et al. (32) to denote variable expression of imprinted loci or variable methylation of an agouti methylation variant.

To identify VMRs from our data, we first processed the raw CHARM data with the statistical procedure described (33). This statistical procedure produced quality metrics (between 0 and 100%) for each sample and, for those that passed our quality test (>80%), a vector of methylation percentage estimates for each feature on the array. These were then smoothed with the standard CHARM approach to reduce measurement error (30, 31). We denote the resulting methylation percentages for subject i at microarray feature j for time t as Mijt.

We used cross-sectional analysis of visit 7 data to identify polymorphic VMRs on the basis of extreme interindividual variance across consecutive probes. Specifically, we estimated between-subject variability with the median absolute deviation, a robust estimate of the SD (14). We computed the median of |Mijtmjt| across subjects, with mjt, the median Mijt across subjects i, and referred to it as sjt (14). To avoid false positives in subsequent analysis of correlations with covariates, we required a stringent definition for designating a polymorphic VMR: a region of 10 or more consecutive probes attaining values of sjt above the 99th percentile of all the sjt and an average sjt of >0.125. We chose these cutoff values with permutation tests. Specifically, we randomized the genomic order of the CHARM probes and applied the above algorithm to find VMRs (including the smoothing step) for each permuted data set. Using our criteria, we obtained 0 false positives. Lowering either the number of consecutive probes or the average sjt thresholds produced false positives.

These VMRs were then annotated for genomic location and gene proximity. Genes within 3 kb of VMRs were considered in a Gene Ontology (GO) analysis of biological process categories. For each GO category, we performed a hypergeometric test (34), with corresponding nominal P value, to determine enrichment of genes near VMRs. We also calculated the FDR for each category statistic to account for the multiple comparisons.

Identification of stable versus dynamic VMRs

Using the average Mijt within the range of each VMR, we generated methylation profiles for each sample. This included a vector of k VMR values for each subject i and time point t. We calculated Dik, the median absolute within-person difference between methylation profiles from visit 6 to visit 7 for each VMR k. We then fit a two-component Gaussian mixture model (35) to these values and used the resulting estimated posterior distributions to classify VMRs into three groups: “stable” (those with posterior probability of membership in the lower distribution >0.99, reflecting little intraindividual change over time); “dynamic” (those with posterior probability of membership in the higher distribution >0.99, reflecting those with high intraindividual change over time); and “ambiguous” (those not meeting either criteria, and thus in the overlap between the two distributions). (Among the stable VMRs, there is some change over time observed in both directions, and when one takes the absolute value of this difference, the result is a small positive number, and thus the central tendency of Dk for stable VMRs is not 0.)

To evaluate discrimination of individuals based on methylation patterns, we applied hierarchical clustering to the vectors of methylation values for the VMRs and graphed individuals into a dendrogram based on similarity of VMRs. We then selected only those VMRs designated as stable in the analysis above and repeated the hierarchical clustering and dendrogram graphic.

Identification of BMI-related methylated regions

We performed cross-sectional analyses for data at each visit separately. For each stable VMR, we fit a linear regression model to summarize the relationship between BMI and methylation. Specifically, for each VMR k, we fit the following model:Yi=ak+bkMik+eik

where Yi is the BMI for individual i, Mik is the methylation level for individual i in the kth VMR, and e represents unexplained variability. Here, bk represents the parameter of interest that summarizes the correlation between BMI and methylation. This produced one Wald statistic for each VMR. We fit this model to the data from visit 7, and to account for the multiple comparisons due to multiple VMRs, we reported a list of regions with an FDR of 0.30. To confirm these results, we independently applied the same regression approach to visit 6 and obtained estimates of b along with P values.


  • * These authors contributed equally to this work.

  • Present address: INSERM, Paris 75014, France.

  • Citation: A. P. Feinberg, R. A. Irizarry, D. Fradin, M. J. Aryee, P.Murakami, T. Aspelund, G. Eiriksdottir, T. B. Harris, L. Launer, V. Gudnason, M. D. Fallin, Personalized epigenomic signatures that are stable over time and covary with body mass index. Sci. Transl. Med. 2, 49ra67 (2010).

Supplementary Material

Table S1. Variably methylated regions across individuals.

References and Notes

  1. Acknowledgments: We thank E. Briem for CHARM hybridization. Funding: Supported by NIH grants 1R01ES015211 (D.F.) and 2P50HG003323 (A.P.F.). The AGES Reykjavik Study is funded by NIH contract N01-AG-12100, Hjartavernd (the Icelandic Heart Association), and Althingi (the Icelandic Parliament). This research was funded in part by the Intramural Research Program of the NIH, National Institute on Aging. Author contributions: M.D.F., V.G., and A.P.F. designed the study; D.F. performed CHARM assays; R.A.I., M.J.A., P.M., and T.A. performed statistical analyses; V.G., G.E., T.B.H., and L.L. designed the AGES study and recruited participants; A.P.F., R.A.I., and M.D.F. interpreted the data and wrote the paper. Competing interests: Johns Hopkins School of Medicine is filing a provisional patent based on these data. Accession numbers: The CHARM data can be found in GEO GSE23858.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article