Research ArticleCancer

Breast Cancer Methylomes Establish an Epigenomic Foundation for Metastasis

See allHide authors and affiliations

Science Translational Medicine  23 Mar 2011:
Vol. 3, Issue 75, pp. 75ra25
DOI: 10.1126/scitranslmed.3001875


Cancer-specific alterations in DNA methylation are hallmarks of human malignancies; however, the nature of the breast cancer epigenome and its effects on metastatic behavior remain obscure. To address this issue, we used genome-wide analysis to characterize the methylomes of breast cancers with diverse metastatic behavior. Groups of breast tumors were characterized by the presence or absence of coordinate hypermethylation at a large number of genes, demonstrating a breast CpG island methylator phenotype (B-CIMP). The B-CIMP provided a distinct epigenomic profile and was a strong determinant of metastatic potential. Specifically, the presence of the B-CIMP in tumors was associated with low metastatic risk and survival, and the absence of the B-CIMP was associated with high metastatic risk and death. B-CIMP loci were highly enriched for genes that make up the metastasis transcriptome. Methylation at B-CIMP genes accounted for much of the transcriptomal diversity between breast cancers of varying prognosis, indicating a fundamental epigenomic contribution to metastasis. Comparison of the loci affected by the B-CIMP with those affected by the hypermethylator phenotype in glioma and colon cancer revealed that the CIMP signature was shared by multiple human malignancies. Our data provide a unifying epigenomic framework linking breast cancers with varying outcome and transcriptomic changes underlying metastasis. These findings significantly enhance our understanding of breast cancer oncogenesis and aid the development of new prognostic biomarkers for this common malignancy.


Breast cancer is one of the most prevalent human malignancies and is a major cause of cancer-related morbidity and mortality. Invasive ductal carcinoma (IDC) of the breast is a phenotypically diverse disease, consisting of tumors with varying pathologic and molecular characteristics (15). The primary biological subtypes of IDC include estrogen receptor (ER)– and progesterone receptor (PR)–positive tumors (luminal A and B), tumors that are human epidermal growth factor receptor 2 (HER2)–enriched, and tumors that are ER/PR-negative (basal-like). These molecular determinants have significant effects on metastatic behavior and clinical outcome. For example, ER/PR+ tumors are generally associated with better clinical prognosis, whereas basal-like (ER/PR and HER2, triple-negative) tumors are associated with higher rates of metastasis and death (69). The genomic alterations, including both genetic and epigenetic aberrations, underlying these differing metastatic potentials are ill-defined.

Significant effort has been undertaken to more accurately define the molecular alterations underlying breast cancer. For example, it has been shown that hormone receptor (HR) status is prognostic for clinical outcome. Mutations in genes such as BRCA1, PTEN, and PIK3CA help promote breast cancer oncogenesis and are enriched in specific subgroups of IDC (1012). Genome-wide sequencing surveys have been performed to identify the scope of mutations in breast cancers (1315). These data demonstrate that there exists substantial biological heterogeneity between and within the ER/PR+ and ER/PR subgroups for which the molecular foundations remain obscure (13). In addition, gene expression classifiers have been developed to help predict metastatic risk (1618). Despite their increasing use in the clinic, the genomic root causes of the transcriptome differences that underlie metastatic potential are unclear.

It is well established that widespread changes in DNA methylation patterns occur during oncogenesis and tumor progression (19, 20). Cancer-specific changes in DNA methylation can alter genetic stability, genomic structure, and gene expression (21, 22). Promoter CpG island methylation can result in transcriptional silencing and plays an important role in the oncogenic process (19). A CpG island methylator phenotype (CIMP), which is associated with a strong tendency to hypermethylate specific loci, has been described in a subset of colorectal cancers, and recently in a subgroup of gliomas (2, 23). Aberrations in DNA methylation have been reported in human breast cancer, but the impact of the methylome on metastasis and the presence of a B-CIMP have remained elusive (2430). To resolve these questions, we conducted a systematic, genome-wide characterization of the breast cancer methylome in breast cancers with diverse metastatic behavior.


Identification and validation of distinct DNA methylome subgroups of breast cancer

The Illumina Infinium HumanMethylation27 platform, which consists of bead arrays that query 27,578 CpG sites each covering 14,495 genes, provides efficient genome-wide interrogation of CpG islands and is both well validated and highly reproducible (mean correlation coefficient = 0.987) (31, 32). Analyses of replicate breast cancer samples by this platform generated highly concordant data (fig. S1). Therefore, we used this platform to analyze a discovery set of IDCs with differing metastatic behavior, including samples with varying ER/PR and HER2 status, from patients with complete clinical follow-up (n = 39, table S1).

To identify breast cancer subgroups, we selected the most variant probes and performed consensus clustering (Fig. 1A, fig. S2, A to D, and table S1) and unsupervised hierarchical clustering (Fig. 1B). We identified two robust DNA methylation clusters, one encompassing a portion of the HR+ tumors (defined as ER/PR+, cluster 2) and one encompassing tumors that were ER/PR+ or ER/PR (cluster 1). Cluster 2 tumors had a highly characteristic DNA methylation profile with high coordinate cancer-specific hypermethylation at a subset of loci (Figs. 1 and 2A and fig. S2), similar to the CIMP phenotype seen in colorectal cancer (for details of hypermethylation definition, see Materials and Methods) (2, 23). Thus, we defined this group (cluster 2) as having a breast CpG island methylator phenotype (B-CIMP).

Fig. 1

Identification of a CIMP by clustering of breast cancer samples. Unsupervised clustering was performed with the Infinium DNA methylation probes whose β values varied most across the breast tumor samples (top 5% most variant probes). (A) K-means consensus clustering (K = 2). Tumors (n = 39) are listed in the same order along the x and y axes. Clusters 1 and 2 are noted in purple and green, respectively, and B-CIMP status is noted on the black and white bars. Consensus index values range from 0 to 1, with 0 being highly dissimilar and 1 being highly similar. (B) 2D hierarchical clustering of the tumors using the same methylation probes as in (A). Each row represents a tumor and each column represents a probe. The level of DNA methylation (normalized and transformed β value) is represented with a color scale (red, methylated; blue, nonmethylated). Data for normal breast (n = 3) and DKO (a cell line with global loss of methylation) are shown but did not contribute to the clustering analysis (62). HR, HER2, and distant recurrence status is shown along the left side of the heat map. Failed, developed distant metastases. Horizontal bars along the bottom of plot denote genes that have decreased or increased methylation in B-CIMP.

Fig. 2

Characterization of B-CIMP and clinical co-variates. (A) Selected genes with differential methylation between CIMP groups in breast tumors. Average β values for the genes are shown on the y axis. Data are presented for CIMP+ (red) and CIMP (blue) tumors. Whisker box plots summarizing each data set are shown (mean ± SD). The boxes delineate the 25th to 75th percentile range. (B) B-CIMP+ tumors are highly associated with HR positivity but not with HER2 status (χ2 test).

In our discovery set, 17 of 39 (44%) tumors were B-CIMP+. The composition of the B-CIMP+ subgroup was confirmed by two independent clustering algorithms [two-dimensional (2D) hierarchical and K-means consensus clustering] (Fig. 1), and both approaches identified the same set of tumors as exhibiting a B-CIMP. Moreover, SigClust evaluation of cluster significance found class boundaries to be significant (Fig. 1B) (33). Indeed, a number of genes, such as ARHGEF7, RASGRF2, and SOX8, showed marked differences between CIMP+ and CIMP tumors (examples in Fig. 2A).

Our array results were validated with EpiTYPER (34), a mass spectrometry–based technique, allowing sensitive detection of DNA methylation at base-pair resolution (fig. S3A and table S2). We observed perfect concordance between B-CIMP calls on the Infinium platform and those from EpiTYPER.

These results demonstrate that there exist profound differences across the methylomes of breast cancers. Although it is possible that additional smaller subgroups exist, the robust nature of our consensus modeling here suggests that there are two primary epigenomic subgroups of IDC as defined by genome-wide methylome profiling.

Clinical characterization and validation of B-CIMP in breast cancer

We then defined the relationship among B-CIMP status, clinical covariates, and known molecular determinants. CIMP+ tumors consisted almost entirely of ER/PR+ tumors (94%, 16 of 17) (Fig. 2B), whereas CIMP tumors consisted of similar numbers of HR+ (45%, 10 of 22) and HR tumors (55%, 12 of 22) (P = 0.001, χ2 test) (Fig. 2B). In contrast, there was no significant difference in frequency of HER2 positivity between the CIMP groups (Fig. 2B). As expected, average methylation intensities were significantly different between CIMP+ versus CIMP, ER/PR+ versus ER/PR, and HR+/CIMP+ versus HR+/CIMP tumors (Fig. 3, A to D).

Fig. 3

B-CIMP and metastatic risk. (A) Relative methylation (normalized and transformed β value) of the genes analyzed in Fig. 1 in CIMP+ versus CIMP tumors. Whisker box plots summarizing each data set are shown (mean ± SD). The boxes delineate the 25th to 75th percentile range. The P value indicates significance determined using ANOVA. (B) Relative methylation of the genes in ER/PR+ versus ER/PR tumors. Whisker box plots summarizing each data set are shown (mean ± SD). Significance determined using ANOVA. (C) Relative methylation of the genes in hormone+, CIMP+ versus hormone+, CIMP tumors. Whisker box plots summarizing each data set are shown (mean ± SD). Significance determined using ANOVA. (D) Relative methylation of the genes in CIMP+ versus CIMP tumors (all probes). Whisker box plots summarizing each data set are shown (mean ± SD). Significance determined using ANOVA. (E) Kaplan-Meier curve for distant metastasis–free survival for B-CIMP+ and B-CIMP subtypes. P value calculated by log-rank analysis. Data from discovery set tumors (Fig. 1). (F) Validation of B-CIMP and its impact on metastatic risk in an independent set of breast tumors (n = 132). EpiTYPER assays were developed for three of the most predictive genes for B-CIMP as indicated in the panel. Samples were analyzed and categorized as CIMP+ if at least two of three genes were methylated, as shown in the heat map. Red, methylated; blue, unmethylated; purple, B-CIMP+. (G and H) Kaplan-Meier curves for (G) distant metastasis–free survival and (H) overall survival by B-CIMP status. P value calculated by log-rank analysis. (I) B-CIMP predicted metastatic risk in ER/PR+ breast cancers. Kaplan-Meier curve for distant metastasis–free survival for B-CIMP+ and B-CIMP subtypes. P value calculated by log-rank analysis.

There was already a strong trend toward improved distant metastasis–free survival in patients with CIMP+ tumors in our initial discovery set (P = 0.078) (Fig. 3E). To validate the existence of the B-CIMP, we used EpiTYPER to evaluate an independent cohort of 132 primary breast cancers. We examined methylation of three loci (ALX4, ARHGEF7, and RASGRF2) that were among the most predictive for B-CIMP in our Infinium data. None of these genes are methylated in normal breast (table S1). Our microarray data indicated that methylation at two of three genes from this set correctly identified ~95% of CIMP+ tumors (table S3). In this validation set, B-CIMP+ tumors demonstrated a significantly lower risk for metastatic relapse (P = 0.007) and death (P = 0.006) (Fig. 3, F to I).

The presence of the B-CIMP was associated with ER/PR status (Fig. 2B) but determined metastatic risk within the HR+ group as well (Fig. 3I). ER/PR+ tumors that were CIMP were associated with significantly worse prognosis than ER/PR+ CIMP+ tumors. Furthermore, multivariate analysis showed that the B-CIMP was a strong predictor of prognosis (P = 0.03; hazard ratio, 0.49) independent of stage, age, nodal status, and HR status (tables S4 and S5). Most of the differentially methylated CpG sites were within known CpG islands, although a portion were not, and most sites were within 500 base pairs (bp) of the transcriptional start site (Fig. 4A).

Fig. 4

Methylation landscape of breast cancer. (A) Characteristics of the hypermethylated sites in the B-CIMP. The plot on the left [distance to transcriptional start site (tss)] shows the proportion of probes that are within the indicated distances from the tss. The plot on the right (composition) shows the proportion of hypermethylated probes that are located in CpG islands, in shores, or in neither CpG islands nor shores. (B) Starburst plot for comparison of DNA methylation and gene expression. Log10(FDR-corrected P value) is plotted for DNA methylation (x axis) and gene expression (y axis) for each gene. Along the y axis, data for fold expression <0 are log10(FDR-corrected P value) and data for fold expression ≥0 are −log10(FDR-corrected P value). Along the x axis, data for β value <0 are log10(FDR-corrected P value) and data for β value ≥0 are −log10(FDR-corrected P value). The black line indicates the FDR-adjusted P value threshold of 0.05. Data points other than red indicate genes that are significantly up-regulated (green) or down-regulated (blue and purple) and also significantly hypo- or hypermethylated in B-CIMP tumors. Points in blue indicate genes that are significantly down-regulated and hypermethylated in B-CIMP+ tumors versus B-CIMP tumors.

The methylation landscape of breast cancer and its effects on the metastasis transcriptome

We next sought to define the nature of the methylome differences between the B-CIMP subgroups and characterize the effects of these differences on the breast cancer transcriptome. Probes were filtered for analysis by ranking log2-transformed β values using decreasing false discovery rate (FDR)–adjusted P values and increasing β-value difference to identify the most differentially hypermethylated genes in the B-CIMP group. Of the 3297 CpG sites that were differentially methylated between CIMP+ and CIMP tumors, 2333 (71%) were hypermethylated (Fig. 4B, table S1, and fig. S3B). There were 2543 unique genes represented within this group, including 1764 that were hypermethylated and 779 that were hypomethylated. Consistent with the results in Fig. 1B, a volcano plot showing differentially methylated genes between B-CIMP+ and B-CIMP tumors was highly asymmetric, with many more hypermethylated genes in CIMP+ tumors (fig. S3B).

Affymetrix transcriptome data were obtained from the same breast tumors analyzed for methylation to determine which genes demonstrated differential expression and methylation. A total of 279 genes were significantly down-regulated and 238 genes were significantly up-regulated (table S1). Gene ontology (GO) analysis showed that the significantly up-regulated genes were highly enriched for functional categories involving cell motion, angiogenesis, apoptosis, development, kinase activity, and DNA binding (table S6). The down-regulated genes were enriched for functional categories involved in mitosis, cytokinesis, exocytosis, chromosomal segregation, transcription factor activity, and kinase activity (table S6).

A starburst plot showing the relationship between DNA methylation and expression levels is shown in Fig. 4B. Here, significance levels of methylation (x axis) versus expression (y axis) differences between CIMP+ and CIMP tumors were plotted. Integration of the normalized gene expression and DNA methylation gene sets identified 102 genes with both significant hypermethylation and down-regulation in B-CIMP+ tumors (Fig. 4B and table S1). Among these genes are those that are involved in breast cancer outcome or epithelial-mesenchymal transition (EMT), including LYN, MMP7, KLK10, and WNT6 (3538). GO analysis showed that B-CIMP–specific down-regulation of genes (hypermethylated and down-regulated in B-CIMP) was associated with cell motion, development, signaling, and catalytic activity, as some of the most relevant functional categories (table S6).

Although mRNA expression signatures have been developed to help predict the risk of metastatic disease in breast cancer patients, the genomic foundations for these differences in gene expression are incompletely understood (17, 39, 40). Few genetic changes have been shown to be causally related to these transcriptional differences. Because B-CIMP status affects metastatic risk, we wondered whether methylation helps account for the transcriptome diversity underlying common breast cancer prognostic expression signatures. To address this question, we performed concept mapping analysis as previously described (41). The methylated and down-regulated genes that make up the transcriptomic footprint of the B-CIMP (B-CIMP repression signature) were markedly enriched among the most differentially expressed genes defining prognosis in multiple breast cancer cohorts (Fig. 5A). Low expression of genes in the B-CIMP repression signature was seen in tumors that did not metastasize, and high expression of the signature was seen in tumors that metastasized and/or resulted in poor survival (Fig. 5A and tables S7 and S8).

Fig. 5

B-CIMP and the metastasis transcriptome. (A) Differentially expressed B-CIMP genes in breast cancer metastasis transcriptomes. Concepts mapping of B-CIMP–repressed genes across multiple data sets associated with metastasis. Each row shows individual gene sets from which a breast prognostic expression signature has been described. The top 10% of the most overexpressed genes from these gene sets were used for the concept mapping. Genes that are significantly hypermethylated and down-regulated in B-CIMP+ tumors (n = 102) correspond to genes whose overexpression is predictive of metastasis (left panel). Red, matching gene between B-CIMP gene and gene predictive of metastatic behavior. Q value calculated as in Materials and Methods (right panel). (B) Kaplan-Meier survival curve showing that the CIMP repression signature (hypermethylated and down-regulated in B-CIMP tumors) predicts survival in the van’t Veer cohort (17). P value calculated by log-rank.

We observed significant associations between B-CIMP genes and breast cancer relapse expression signatures from multiple independent data sets, confirming the validity of our findings. Using the van’t Veer cohort, we demonstrated that the presence of the B-CIMP repression signature strongly predicted survival (Fig. 5B). Again, breast cancers in which the CIMP repression signature was present were associated with significantly better survival than tumors lacking the signature. Furthermore, gene set enrichment analysis (GSEA) demonstrated a significant inverse correlation between B-CIMP–repressed genes and genes up-regulated in highly metastatic tumors (fig. S4). These findings indicate that epigenomic alterations associated with the B-CIMP underlie many of the gene expression differences observed in currently used breast cancer prognostic signatures such as MammaPrint.

Distinct gene sets targeted by the B-CIMP

To elucidate the differences in the methylation landscape between the two epigenomic subclasses, we mapped regions of the most significant methylation differences between CIMP+ and CIMP tumors across the genome (Fig. 6). Dense clusters of methylation density were apparent in the arms of a number of chromosomes. GSEA of the differentially methylated genes showed a significant enrichment for polycomb complex 2 (PRC2) targets (Fig. 6, figs. S5 to S7, and table S9) (42). Within the B-CIMP gene set, genes targeted by PRC2-associated marks or proteins were significantly overrepresented; these included H3K27 methylation, Suz12, and EZH2. These genes are noted along the genome, adjacent to the B-CIMP genes, highlighting the similarity between these two gene sets (Fig. 6).

Fig. 6

Consensus between the PRC2 and methylome landscapes in B-CIMP. Significance and locations of methylation enrichment in B-CIMP+ tumors are plotted across the genome and indicated by the blue bars. Significance and locations of PRC2 enrichment are shown by the red bars. The horizontal axis indicates level of significance (FDR-corrected, ANOVA).

GSEA analysis using the Broad molecular signature database demonstrated that CIMP genes were most significantly enriched in polycomb (PcG) occupancy data sets, although other processes were also implicated, including EMT and Wnt signaling, which are known to contribute to metastasis (tables S10 and S11) (43). It has been shown that the presence of a bivalent chromatin mark involving the key PcG mark, trimethylated H3K27, in stem cells may predispose specific genes to become hypermethylated and silenced in cancer and may be indicative of a contribution of stem cells to the derivation of specific cancers (44, 45). Perhaps this process is active in breast tumors of the B-CIMP subclass.

CIMP across multiple human cancers

Does the CIMP target the same genes in different human tumor types? We compared the CIMP-associated loci from breast cancer, colon cancer, and glioma (publicly available from The Cancer Genome Atlas, CIMP-associated genes were defined for glioma and colon cancer with the same methodology as above and were consistent with previous data (1, 2). Colon CIMP (C-CIMP) genes were derived from Memorial Sloan-Kettering Cancer Center (MSKCC) tumors (n = 24) with hierarchical clustering and confirmed as described in Materials and Methods and by Weisenberger et al. (2). Glioma CIMP (G-CIMP) genes were as previously described (1). All data sets were generated with the same Infinium HumanMethylation27 platform and were directly comparable.

We first wished to determine whether CIMP selectively targeted PcG targets not only in breast cancer but in other malignancies as well. All methylated loci (β-value FDR-corrected P < 0.05) in the three tumor types were compared with previously generated global PcG target gene sets (46, 47) (tables S12 and S13). Highly significant overlap was observed between CIMP and PcG targets in breast, glioma, and colon cancer (Fig. 7, A and B), potentially indicating that the CIMP may use similar processes across cancer types. Using the 33 most significant common predictors of CIMP, we generated a consensus signature for CIMP positivity across these tumor types (Fig. 7C and table S14). Breast tumors that are identified as B-CIMP+ by the 3-gene signature used above are concordant with those positive for the 33-gene signature and thus predicted good prognosis in breast cancer as shown in Fig. 3E. We then verified that this same 33-gene signature was associated with better prognosis in glioblastoma multiforme (GBM) (Fig. 7C). Thus, the CIMP imparted a favorable clinical prognosis in breast cancer, colon cancer (CIMP-hi, microsatellite unstable) (2, 48), and glioma (1), and as such, this epigenomic signature may be useful as an indicator of outcome across multiple human malignancies.

Fig. 7

Consensus target genes of the CIMP across multiple human cancers. (A) Venn diagram showing common gene targets between the PRC2 targets described in (47) and CIMP in the three indicated cancers. Numbers in parentheses indicate number of genes in common between PRC2 target genes and CIMP targets in each cancer type. The table next to the diagram shows the level of significance between these overlapping gene lists (P value, hypergeometric distribution). The numbers in the Venn diagram show the number of CIMP/PRC2 common targets that are shared between the cancer types. (B) Same as in (A), except PcG targets are from the Suz12 targets described in (46). (C) The left diagram shows the 33 most significant CIMP targets common to all three malignancies. Significance [−log10(FDR-corrected P value)] was plotted for DNA methylation along the y axis, and genomic location or chromosome number was plotted along the x axis. Genes are shown in a separate color for each tumor type according to the legend. The right diagram shows a Kaplan-Meier curve depicting the survival of patients with GBM tumors in which the 33-gene CIMP signature was present versus tumors in which the signature was absent. P value calculated by log-rank. TCGA, The Cancer Genome Atlas.


Our findings have several important implications for the understanding of breast cancer. First, we have definitively identified distinct epigenomic subtypes of breast cancer and documented the existence of a global CIMP in breast cancer. Aberrant hypermethylation of genes has been described in breast cancer previously (26, 4951), and the methylation state of specific genes has been linked to outcome (5254). However, the existence of a global B-CIMP has remained elusive before our study.

Our global approach robustly identified the B-CIMP as a characteristic of a subset of hormone-positive tumors. The ER has a profound effect on transcriptional activation and repression of many genes, and it may be that these changes contribute to patterning the epigenome. However, from our data, it does not appear that ER/PR positivity alone dictates B-CIMP status because our analysis above repeatedly identified a subset of hormone-positive tumors that were not positive for CIMP. Perhaps, other genetic or epigenetic events are important in dictating epigenomic fate in these tumors. Also, there may exist other epigenomic subgroups of breast cancer within the major methylome subclasses that we describe here. Additional studies will be required to address this possibility.

In our study, B-CIMP+ tumors demonstrated a lower propensity for metastasis and a better clinical outcome than B-CIMP tumors. The association of better clinical outcome with CIMP+ tumors could be seen across multiple malignancies (breast, colon, and glioma) (2, 23). In these tumors, it may be that the epigenomic defects causing the CIMP initially helped promote neoplastic transformation but inactivate genes that facilitate tumor aggressiveness in later stages of cancer progression. It is important to note, however, that the association of methylation at CIMP genes with good clinical outcome is not universally applicable to methylation at all genes. Methylation of specific candidate genes or groups of genes has been associated with poorer prognosis, and these genes may have an effect on tumor aggressiveness independent of the CIMP (27, 53, 5557). Genes such as these, including CDKN2A, PTPRD, and BRCA1, were not included among the B-CIMP loci.

The genomic basis of prognostic transcriptional signatures is unclear. Our data demonstrate that aberrations in the DNA methylome explain many of the mRNA expression differences that underlie these signatures. The tight association of these changes with a genome-wide concerted hypermethylation phenotype and their enrichment for PcG targets argues against the inactivation of these genes as being sporadic events. Rather, the B-CIMP phenotype is consistent with a global, systematic derangement in epigenetic regulation. The methylome profiles we have derived and the associated CIMP repression signature provide a previously unknown mechanistic link between breast cancers with differing metastatic behavior and transcriptional signatures that predict metastatic relapse. However, although we show that methylation-associated gene silencing underlies many metastasis-associated gene expression changes, genetic changes are undoubtedly important as well. Indeed, mutations of a number of genes such as BRCA1, PTEN, and ERBB2 have been shown to be associated with an increased risk of metastasis (5, 58, 59). The relationship between these mutations and the B-CIMP phenotype remains to be elucidated, and it is likely that both genetic and epigenetic alterations contribute to the metastatic phenotype. BRCA1 has recently been shown to up-regulate DNMT1, which may help explain the association between BRCA1 mutation, basal-type tumors, and the lack of methylation we have observed in our study among HR breast cancers (60). Future studies will be required to define any potential causal relationship between mutations and derangements in the epigenomic landscape.

Our data show a large-scale consensus between CIMP genes from different human cancers. The CIMP in these different malignancies target many of the same genes, which are PcG targets. We speculate that these similarities reflect common mechanistic foundations. Despite their similarities, differences do exist between the PcG targets that comprise the B-, C-, and G-CIMP, which may reflect a degree of tissue or organ specificity.

In summary, these data provide an epigenomic framework for understanding the transcriptomic signatures present in breast cancers with differing metastatic behaviors. Our findings may enable the development of new molecular diagnostics that more accurately reflect the epigenomic underpinnings of breast cancer prognosis. These diagnostics may help further refine our ability to implement personalized medicine for breast cancer patients.

Materials and Methods

Tumor samples

Breast tumors (discovery set, n = 39; validation set, n = 132) from the MSKCC were obtained after patient consent and with institutional review board (IRB) approval. For the breast tumor data, tissues from primary breast cancers were obtained from therapeutic procedures performed as part of routine clinical management. The samples that make up the discovery set were chosen to include tumors reflecting the spectrum of different types of breast cancers in representative frequencies (ER/PR+, 67%; HER2 status, 50%; distant failures, 36%). The validation set was chosen on the basis of available material and in order for the set to be representative of known frequencies of HR status, HER2 status, stage, and distant failure status (8). Notably, both ER/PR+ and ER/PR tumors, HER2+ and HER2 tumors, and tumors that resulted in metastases and those that did not were included. For the validation set, of 132 cases with an estimated event rate of 40%, we expected a power of >95% (α = 0.05). Source DNAs or RNAs were extracted from frozen or paraffin-embedded primary tumors for the methylation and expression studies. Frozen samples were “snap-frozen” in liquid nitrogen and stored at −80°C. Each sample was examined histologically with hematoxylin and eosin (H&E)–stained sections. Regions were microdissected from the slides to provide a consistent tumor cell content of more than 70% in tissues used for analysis. Genomic DNA was extracted with the QIAamp DNA Mini kit or the QIAamp DNA FFPE Tissue kit (Qiagen) using the manufacturer’s instructions. RNA was extracted with Trizol (Invitrogen) according to the manufacturer’s directions. Nucleic acid quality was determined with the Agilent 2100 Bioanalyzer. Nucleic acids from the discovery set were used for methylation and expression analysis as described below.

Methylome analysis using the Infinium platform

Bisulfite-converted genomic DNA was analyzed with the Infinium HumanMethylation27 Beadchip Kit (Illumina, WG-311-1202) by the MSKCC Genomics Core. Processing of the array was per the manufacturer’s protocol. Methylation analysis controls included in vitro–methylated DNA (positive control) (61) and human HCT116 DKO DNA [DNA methyltransferase double-knockout cells (DNMT1 and DNMT3b)] (62).

Gene expression profiling

Methods for RNA extraction, labeling, and hybridization for DNA microarray analysis have been described previously (39). All gene expression analyses were performed with the Affymetrix Human Genome U133A 2.0 microarray.

Data analysis for genomics

For expression analysis, the Affymetrix data were imported into the Partek Genomics Suite (Partek Inc.). Data were normalized, log-transformed, and median-centered for analysis. Analysis of variance (ANOVA) followed by false discovery correction (FDR) (63, 64) was used to identify genes that were differentially expressed between the CIMP groups. Hierarchical clustering was performed with either Euclidean distance or Pearson correlation. SigClust significance as implemented in the R package SigClust was used as described (33). For Gene Ontogeny analysis, functional analysis of gene lists was performed with the Database for Annotation, Visualization and Integrated Discovery (DAVID) (65, 66) and the PANTHER functional annotation classes. PANTHER categories with adjusted P values (FDR-corrected with Benjamini-Hochberg) of <0.05 were considered as significantly overrepresented in our gene lists.

For methylation analysis, Illumina data were imported into Partek with custom software. β Values were logit-transformed and mean-centered before analysis. ANOVA with false discovery correction (FDR) (63, 64) was used to identify genes that were differentially methylated between the CIMP groups. Significant changes were defined as genes having an FDR-corrected P value of <0.05. Probes with an adjusted P value below 0.05 were considered significantly differentially methylated between the two sets of tumors. The hierarchical clustering of the methylation data was performed as above with the top 5% most variant probes across the samples (defined by SD). K-means consensus clustering was performed with the R statistical package. Consensus clustering was performed with K-means clustering (Kmax = 9) with Euclidean distance and average linkage over 1000 resampling iterations with random restart (as implemented in GenePattern v3.2.3) (67).

For identification of CIMP genes in colon cancer and glioblastoma, analysis was performed as follows. Methylation data for GBM were downloaded from The Cancer Genome Atlas data portal. Methylation data for colon cancer (n = 24) were generated from MSKCC primary tumors with the Illumina HumanMethylation27 array. Hierarchical clustering was performed as described above with the breast cancer data using the top 5% most variant probes. Iterations using the top 3 to 20% did not significantly alter the clustering results. The cluster results were confirmed with the methylation β values of the five-gene panel described by Weisenberger et al. to identify CIMP+ tumors in colorectal cancers (2). The cluster of samples that exhibited hypermethylation of these marker genes was selected as CIMP+ and used for further analyses. These corresponded to the cluster with high coordinate hypermethylation derived by hierarchical clustering. The glioblastoma CIMP genes were identified as described (1). The Cancer Genome Atlas Project GBM cancer data sets are publicly available at

Concepts module mapping and GSEA

Concepts module mapping was performed as follows. The methylation signature identified from our analysis (table S1) was imported into Oncomine ( to search for associations with molecular concepts signatures derived from independent cancer profiling studies. We report statistically significant overlaps of our methylation gene signature with the top-ranking gene expression signatures of clinical outcome using percentile cutoffs (10%). Q value is calculated as previously described (41).

GSEA was performed with GSEA software v2.0.7 (68) and MSigDB database v2.5 (68). We assessed the significance of the gene sets with the following parameters: number of permutations = 1000 and permutation_type = phenotype with an FDR Q-value cutoff of 25%. The most differentially expressed genes from statistically significant gene sets were identified with the “leading edge subset” that consists of genes with the most contribution to the enrichment score of a particular gene set. Enrichment of gene sets downloaded from the literature (as referenced in tables S9 and S10) was analyzed together with the curated gene sets (MSigDB collection c2) or within each other.

Quantitative DNA methylation analysis using mass spectrometry

DNA methylation analysis was performed with the EpiTYPER system (Sequenom). The EpiTYPER assay is a tool for the detection and quantitative analysis of DNA methylation with base-specific cleavage of bisulfite-treated DNA and matrix-assisted laser desorption/ionization–time-of-flight mass spectrometry (MALDI-TOF MS) (69). For primer sequences, target chromosomal sequence, and EpiTYPER-specific tags, see table S2. SpectroCHIPs were analyzed with a Bruker Biflex III MALDI-TOF mass spectrometer (SpectroREADER, Sequenom). Results were analyzed with the EpiTYPER Analyzer software and manually inspected for spectra quality and peak quantification. CIMP positivity was defined as a mean methylated allelic frequency of >50% or a twofold increase over normal breast tissue and the CIMP state.

Statistical analysis and CIMP repression signature

The 295-sample van’t Veer microarray data set (NKI295) was downloaded from the Rosetta InPharmatics Web site (17). Seventy of 102 genes of our methylation signature were represented in the NKI295 and were used to test for prognostic significance. An average expression value was calculated for our “hypermethylated and down-regulated in B-CIMP” gene set across each sample of NKI295. A two-way classifier was developed by separating the patients into two groups on the basis of the average expression value of our methylation signature: CIMP repression signature up-regulated if the average expression value was >0 and CIMP repression signature down-regulated otherwise. Kaplan-Meier curves comparing survival of patient subgroups were generated with SPSS statistical software.

Supplementary Material

Fig. S1. Methylome analysis using the Infinium HumanMethylation27 platform.

Fig. S2. Statistical output from K-means consensus clustering analysis on the 39 breast cancers in the discovery set using the most variant probes (n = 1359) (most variant 5% based on ranked SD).

Fig. S3. Validation of B-CIMP loci and the methylation alterations in B-CIMP.

Fig. S4. GSEA results of CIMP repression signature on the van’t Veer data set.

Fig. S5. GSEA results with Suz12-associated genes.

Fig. S6. GSEA results with polycomb recessive complex 2 (PRC2) target genes.

Fig. S7. GSEA results of developmental transcription factors bound by Suz12.

Table S1. Clinical characteristics and methylome analysis of breast cancers. (Excel files)

Table S2. EpiTYPER primers.

Table S3. Sensitivity and specificity for genes used for B-CIMP validation.

Table S4. Multivariate analysis shows CIMP predicts metastatic risk.

Table S5. Validation samples clinical information.

Table S6. Gene ontology analysis of significantly altered genes in B-CIMP.

Table S7. Data sets for concepts mapping of metastasis transcriptomes.

Table S8. References for Oncomine data sets used in concepts analysis.

Table S9. GSEA output for CIMP gene analysis using polycomb occupancy data sets.

Table S10. GSEA enrichment results for B-CIMP genes (CIMP+ versus CIMP) using Molecular Signatures Database.

Table S11. Data sets for PRC enrichment analysis.

Table S12. Data for polycomb occupancy analysis of B-, C-, and G-CIMP.

Table S13. Gene summary for polycomb occupancy analysis of B-, C-, and G-CIMP. (Excel files)

Table S14. Genes common to CIMP in breast cancer, colon cancer, and glioblastoma.


  • * These authors contributed equally to this work.

  • Citation: F. Fang, S. Turcan, A. Rimner, A. Kaufman, D. Giri, L. G. T. Morris, R. Shen, V. Seshan, Q. Mo, A. Heguy, S. B. Baylin, N. Ahuja, A. Viale, J. Massague, L. Norton, L. T. Vahdat, M. E. Moynahan, T. A. Chan, Breast Cancer Methylomes Establish an Epigenomic Foundation for Metastasis. Sci. Transl. Med. 3, 75ra25 (2011).

References and Notes

  1. Acknowledgments: We acknowledge Y. Zhao, J. Wongvipat, K. Huberman, I. Dolgalev, S. Thomas, and the MSKCC Genomics Core for technical expertise. Funding: This work was supported in part by The Cancer Genome Atlas Project (S.B.B.); the Flight Attendant’s Medical Research Institute (T.A.C.); the Louis Gerstner Foundation (T.A.C.); the Elsa U. Pardee Foundation (T.A.C.); the AVON Foundation (T.A.C.); the STARR Cancer Consortium (T.A.C. and L.T.V.); the Susan G. Komen Foundation (N.A.); the Mary K. Ash Foundation (N.A.); the Howard Hughes Medical Institute (J.M.); a National Cancer Institute training grant (T32CA009685, L.G.T.M.); and the Metastasis Research Center (T.A.C.). Author contributions: T.A.C., S.T., and F.F. designed the experiments. F.F., S.T., A.R., D.G., A.H., A.V., and A.K. performed the experiments. F.F., S.T., T.A.C., L.G.T.M., R.S., V.S., Q.M., and M.E.M. analyzed the data. S.B.B., N.A., J.M., L.N., and L.T.V. contributed new reagents/analytic tools/expertise. T.A.C., F.F., and S.T. wrote the paper. Competing interests: S.B.B. is a consultant for MDX in regard to MSP (methylation-specific polymerase chain reaction) assays and receives unrestricted grant support from the company. MDX has no known financial interests associated with this work. The other authors declare that they have no competing interests. Accession numbers: Data sets are deposited in the Gene Expression Omnibus under accession number GSE26349. The Cancer Genome Atlas Project GBM cancer data sets are publicly available at
View Abstract

Navigate This Article