Research ArticleCancer

DNA Repair Pathway Gene Expression Score Correlates with Repair Proficiency and Tumor Sensitivity to Chemotherapy

See allHide authors and affiliations

Science Translational Medicine  26 Mar 2014:
Vol. 6, Issue 229, pp. 229ra42
DOI: 10.1126/scitranslmed.3008291


Mutagenesis is a hallmark of malignancy, and many oncologic treatments function by generating additional DNA damage. Therefore, DNA damage repair is centrally important in both carcinogenesis and cancer treatment. Homologous recombination (HR) and nonhomologous end joining are alternative pathways of double-strand DNA break repair. We developed a method to quantify the efficiency of DNA repair pathways in the context of cancer therapy. The recombination proficiency score (RPS) is based on the expression levels for four genes involved in DNA repair pathway preference (Rif1, PARI, RAD51, and Ku80), such that high expression of these genes yields a low RPS. Carcinoma cells with low RPS exhibit HR suppression and frequent DNA copy number alterations, which are characteristic of error-prone repair processes that arise in HR-deficient backgrounds. The RPS system was clinically validated in patients with breast or non–small cell lung carcinomas (NSCLCs). Tumors with low RPS were associated with greater mutagenesis, adverse clinical features, and inferior patient survival rates, suggesting that HR suppression contributes to the genomic instability that fuels malignant progression. This adverse prognosis associated with low RPS was diminished if NSCLC patients received adjuvant chemotherapy, suggesting that HR suppression and associated sensitivity to platinum-based drugs counteract the adverse prognosis associated with low RPS. Therefore, RPS may help oncologists select which therapies will be effective for individual patients, thereby enabling more personalized care.


Homologous recombination (HR) and nonhomologous end joining (NHEJ) are competing pathways that repair double-strand DNA breaks (DSBs) generated by certain cancer treatment modalities. HR also serves additional functions such as promoting cellular tolerance to DNA-damaging drugs that disrupt replication forks (1). Both HR and NHEJ facilitate DNA repair following the recruitment of upstream sensor/effector proteins (Fig. 1). The HR pathway catalyzes DSB repair by identifying a stretch of homologous DNA and by replicating from this homologous DNA template, whereas NHEJ repairs DSBs by processing and religating the DSB ends (1, 2).

Fig. 1. Pathways and genes involved in repair of DSBs and the tolerance of replication stress.

Shown is a simplified overview of the mechanistic steps and genes involved in DNA repair, with an emphasis on those that facilitate HR and NHEJ. All of the displayed genes were considered candidates for the RPS system, except those within the blue box. The four genes whose expression levels were ultimately chosen to comprise the RPS are displayed in red.

When faced with a DSB, the cell’s decision of whether to use HR or NHEJ is influenced by the cell cycle stage. NHEJ is the dominant pathway for repairing DSBs during the G0-G1 stages of the cell cycle, whereas HR occurs generally during S and G2. This regulation of repair is governed primarily by BRCA1 and 53BP1 proteins, which compete for occupancy at the DSB site (3). Stabilization of 53BP1 in cooperation with Rif1 leads to the exclusion of BRCA1 protein from the repair complex, and the DSB subsequently progresses to repair by NHEJ (4, 5). If 53BP1 is excluded from the repair complex, then the DSB progresses to repair by HR. In this case, the DSB ends are processed into HR substrates, which involves 5′ to 3′ nuclease activity that generates 3′ single-stranded DNA (ssDNA) tails. This end processing is promoted by several proteins including CtIP, BRCA1, and the MRN (Mre11/RAD50/NBS1) complex. The nuclease activity is also specifically triggered by interactions between Mre11 and cyclin-dependent kinase 2, thereby promoting the phosphorylation of CtIP preferentially in S-G2 cells (6).

The efficiencies of these repair processes have important implications for carcinogenesis and malignant tumor progression. Like HR, the canonical version of NHEJ is thought to repair DNA with high fidelity (7, 8). However, some DSBs can undergo extensive degradation before religation using processes termed microhomology-mediated end joining and single-strand annealing, both of which create mutagenic deletions (8, 9). Similarly, mutations can arise if replication-disrupting lesions are not properly repaired before DNA replication, in which case these lesions may prompt homology-mediated polymerase template switching (10). Thus, tumors that harbor ineffective error-free DNA repair machinery are likely to exhibit greater genomic instability, which is expected to drive malignant progression and generate more aggressive tumor phenotypes. A method that predicts error-free repair proficiency from human tumor biopsy tissues might have broad applications in clinical oncology as a prognostic indicator because genetic instability may indicate a greater propensity for malignant phenotypes like metastagenicity.

The cellular efficiencies of these repair processes can also directly affect tumor responsiveness during the treatment of cancer patients. The most marked examples are the hypersensitivities of HR-deficient tumors to poly(adenosine diphosphate–ribose) polymerase (PARP) inhibitors (1113) or platinum-based chemotherapies (14, 15). At present, methods to measure HR or NHEJ proficiency from human tumor biopsy tissues are limited (16, 17). Some studies have measured the rate of DSB rejoining in tumors (such as H2AX phosphorylation kinetics), and rapid DSB rejoining may predict resistance of human tumors to radiotherapy and some chemotherapy drugs [reviewed in (18)]. However, a method that successfully quantifies repair efficiency might have important applications in clinical oncology because it would predict sensitivity of tumors to specific classes of treatment.

Human tumors exhibit a wide range of malignant features and responsiveness to treatments that damage DNA. We hypothesized that a component of this variability can be explained by differential efficiencies of DNA repair pathways. To study this further, we developed an analytic tool to indirectly quantify the efficiency of HR in individual cancers. This scoring system relies on the expression of four DNA repair genes: Rif1, PARI, RAD51, and Ku80. We show here that the recombination proficiency score (RPS) correlates with sensitivity to specific classes of chemotherapy, associates with degree of genomic instability within tumor cells, and provides valuable information that is not available using existing diagnostic methods.


The RPS system was developed using data from carcinoma cell lines

We sought to create a method that quantifies the efficiency of HR repair within any given cancer. To accomplish this, we developed a scoring system that correlates gene expression patterns with HR proficiency in human cancer cell lines. Gene expression levels and corresponding drug sensitivity data were collected from the Broad-Novartis Cancer Cell Line Encyclopedia (CCLE) (19). Given the wide biological diversity known to exist between different classes of human malignancies, we limited this analysis to cell lines derived only from carcinomas. Cellular resistance to the topoisomerase I–inhibiting drug topotecan was selected as a surrogate marker for HR proficiency. Topotecan is a derivative of camptothecin, and this class of drugs was selected because it disrupts replication forks and exerts toxicity preferentially in cells that harbor HR defects (20, 21). Topotecan sensitivity data were available for 279 of the 634 carcinoma cell lines.

To focus our analysis on the primary cellular features that mediate specific phenotypes, we restricted the analysis to genes with direct relevance to replication stress and the DSB repair pathways (Fig. 1). We further limited the analysis to 33 central proteins that participate in cellular preference toward HR versus NHEJ, following the ataxia-telangiectasia–mutated (ATM) and/or ATM- and Rad3-related protein (ATR) activation steps of DNA damage response. Levels of mRNA were available for all of these genes except Ku70. Secondary regulators of damage response (like TP53, PTEN, and cell cycle checkpoint genes) were not considered as gene candidates for the scoring system because they exert cellular influences that extend beyond the scope of replication stress and DSB repair. Pearson’s correlation analyses demonstrated that 12 of the final list of 32 candidate genes had expression levels that significantly correlated (defined as P < 0.05) with cellular sensitivity to topotecan (table S1). In all 12 cases, increasing gene expression levels directly correlated with increasing topotecan sensitivity.

HR-related genes are highly expressed in cancer cells that harbor low HR efficiency

Rif1, Ku80, and PARI were among the genes whose expression most strongly associated with topotecan sensitivity. Rif1 and Ku80 are known to promote NHEJ and antagonize HR (4, 5, 9). PARI is a helicase capable of disrupting RAD51 nucleofilaments, and it has been reported to antagonize HR repair (22).

Topotecan sensitivity also correlated with the overexpression of a family of HR-related genes, including RAD51, BLM, BRCA1, RAD51AP1, RAD54B, PLK1, BRCA2, RAD51C, and PALB2. This observation appears counterintuitive on the surface because RAD51 and many of these RAD51-associated proteins are generally considered to promote HR. However, RAD51 overexpression has been previously shown to occur in the setting of HR defects caused by BRCA mutations (23, 24). To investigate a possible connection between BRCA mutation phenotypes and our observed expression patterns, we analyzed gene expression levels in CCLE cell lines that harbor BRCA1 (HCC1937 and MDA-MB436) or BRCA2 (CAPAN1) mutations. Consistent with published observations in BRCA1-mutant human tumors (23), these three cell lines significantly overexpressed RAD51 and RAD51AP1. In addition, we found that BRCA-defective cells significantly overexpressed additional genes known to promote various mechanisms required for HR, including CtIP, which promotes the 5′ to 3′ ssDNA end resection (25), Plk1, which promotes the phosphorylation of 53BP1 and RAD51 (26, 27), and several genes (XRCC2, XRCC3, and PALB2) that promote RAD51 filament assembly (28). These data suggest that BRCA-defective cells respond to their HR defects by increasing the expression of a fairly broad array of HR-related genes. The overexpression of HR genes as a compensatory mechanism has been proposed previously, particularly because RAD51 overexpression is known to partially suppress the HR defects that occur when key HR genes are mutated (23, 29).

These findings were used to refine the list of genes to be used in the RPS. We hypothesized that when HR deficiency occurs in wild-type BRCA backgrounds, cells respond via compensatory overexpression of HR-related genes that mirrors the phenotypes observed in BRCA mutant cells. Hence, we reasoned that many of the HR-related genes were reporting redundant information in response to low HR proficiency. Gene expression levels were combined to generate a single model that correlates with topotecan sensitivity, starting with genes that have known HR-antagonizing activities (Rif1, Ku80, and PARI) in order of their independent degree of correlation (table S2). The family of HR-related genes was then subsequently added incrementally into this model. The addition of RAD51 improved the model’s correlation with topotecan sensitivity (relative to the initial three genes); however, the inclusion of additional HR-related genes did not further improve the correlation. Therefore, the final four genes selected to derive the RPS were Rif1, PARI, Ku80, and RAD51.

Elevated mRNA levels for any of these genes correlated with greater sensitivity to topotecan. The RPS was defined as the sum of these four expression levels multiplied times −1, using the log2-transformed mRNA values of each gene normalized to the median mRNA within the starting 634 carcinoma cell lines. The median RPS within the carcinoma cell lines was approximately zero, the bottom 25th percentile of RPS were less than −1.08, and the top 25th percentile of RPS were greater than 1.2.

CCLE cell lines with low RPS did indeed overexpress a broad array of HR-related genes (Fig. 2). These data support the existence of a compensatory mechanism that responds to low HR efficiency. Furthermore, these results suggest that this compensatory mechanism is not limited to only the most extreme HR defects, like those resulting from BRCA mutations. MEN1 protein was considered as a possible mediator of this proposed compensatory process because MEN1 has been shown to stimulate the transcription of several HR genes, including BRCA1, RAD51, and RAD51AP1 (30). However, this explanation was deemed unlikely because MEN1 mRNA levels did not significantly correlate with RPS in CCLE cell lines.

Fig. 2. Cell lines with low RPS overexpress a wide array of HR-related genes.

Mean mRNA levels are shown for the CCLE cell lines with low RPS (bottom 25th percentile). These mRNA levels were mined from the CCLE database, and displayed values represent log2-transformed mRNA measurements of each gene normalized to the median mRNA among the starting 634 carcinoma cell lines. Therefore, an expression level of zero indicates a median expression level, and any positive value indicates overexpression. For example, a value of +0.25 indicates a 19% increase in expression above the median. Error bars denote SE.

RPS associates with HR proficiency in individual cancer cell lines

The diagnostic value of the RPS was further tested on the basis of sensitivity to different types of chemotherapeutic agents. Similar to results with topotecan, low RPS correlated to sensitivity to irinotecan, another topoisomerase I–inhibiting drug (Fig. 3A). This is expected because topoisomerase I inhibitors generate replication fork disruptions, which require HR for repair (20, 21). As a control, this analysis was repeated using the non–DNA-damaging drug paclitaxel, and RPS did not show a correlation with sensitivity to this agent. These results support the specificity of RPS to DNA-related damage and repair. Note that complete drug sensitivity data were not available for all three chemotherapy agents in all cell lines evaluated (see fig. S1 for breakdown). However, comparable results were observed when the analyses were repeated on the subset of 137 cell lines that were tested with all three agents.

Fig. 3. RPS correlates with sensitivity to different classes of treatment and HR deficiency in cell lines.

(A) CCLE carcinoma cell lines were binned into quartiles on the basis of RPS. Sensitivity data were mined from the CCLE database and plotted for different oncologic therapies, and differences between the highest and lowest quartiles were determined by Student’s t test. (B) HR repair efficiency correlates with RPS. Six representative cell lines were cotransfected with an HR reporter–containing plasmid (pDR-GFP) plus an I-Sce I–expressing plasmid (pCβASce) or an empty vector control plasmid (pCAG), and were subjected to fluorescence-activated cell sorting analysis 48 hours later. Reported HR efficiency represents the percent GFP+ cells with pDR-GFP + pCβASce, normalized to background (pDR-GFP + pCAG).

The ability of RPS to identify repair pathway preference was further tested by measuring HR repair efficiency in representative cell lines with low RPS (RKO, DU 145, and COLO 205) or with mid/high RPS (PC3, HCC44, and NCI-H650). These cell lines exhibited expected levels of sensitivity to topotecan and paclitaxel when independently retested in our laboratory (fig. S2), which were comparable to the sensitivities mined from the CCLE database. These six cell lines were tested using a modified version of the previously described DR-GFP reporter method (31). This method uses a reporter DNA construct that carries two nonfunctional copies of green fluorescent protein (GFP), one of which is interrupted by an I-Sce I endonuclease site. Induction of a DSB at the I-Sce I site can lead to repair by homologous gene conversion that generates a functional copy of GFP. As demonstrated in Fig. 3B, RPS correlated with HR efficiency on linear regression analysis (R2 = 0.833, two-sided P = 0.003). For consistency with the other results, RPS values for these cells were calculated using array-based mRNA levels from the CCLE database. We verified the identity of our six cell lines by short tandem repeat profiling (Genetic Resources Core Facility at Johns Hopkins School of Medicine), and independent quantitation by real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR) generated mRNA measurements that were comparable to the mRNA levels reported in the CCLE database (fig. S3).

Cell lines with high RPS have elevated genomic instability

HR plays a central role in maintaining genomic stability in cells. We hypothesized that cells with low RPS would exhibit more genome instability than those with high RPS. To test this hypothesis, we analyzed single-nucleotide polymorphism (SNP) array–based DNA copy number variations (CNVs) using CCLE carcinoma cell lines (Fig. 4). Low RPS was associated with more frequent DNA amplifications. This finding is consistent with published analyses of HR-defective cell lines, showing that mutations in RAD51D or XRCC3 promote DNA amplifications (32). These amplifications are proposed to result from stress-induced replication fork disruption and subsequent homology-mediated polymerase template switching (7, 10). A study in RAD51-defective Saccharomyces cerevisiae demonstrated that cells with deregulated HR frequently channel DSBs into repair by nonallelic break-induced replication, thereby stimulating the formation of segmental duplications (33). Additionally, we found that cells with low RPS harbored relatively frequent DNA deletions. Deletions are characteristic of error-prone repair processes like microhomology-mediated end joining and single-strand annealing (8, 9). Notably, the distributions of CNV sizes were not strongly influenced by RPS. Together, these results suggest that low-RPS cells have reduced HR proficiency and rely more on error-prone processes to rejoin DSBs and/or to tolerate replication stress.

Fig. 4. CCLE carcinoma cell lines with low RPS have elevated genomic instability.

SNP array–based DNA CNVs were mined from the CCLE database. DNA deletions (left) and amplifications (right) were binned by size, wherein bins represent 10-fold increments in mutation size. High- and low-RPS groups were defined as the top and bottom quartiles, respectively. (A and B) Size-based distributions of CNVs are shown for (A) TP53 wild-type cells (n = 124) and (B) TP53 mutant cells (n = 193). Red indicates low RPS, and blue indicates high RPS. Error bars denote SE. Asterisks denote significant differences based on Student’s t test.

Mutations in TP53 are also known to exert major influences on cellular resistance to DNA-damaging therapies and genomic instability. Additionally, TP53 mutation status has been shown to influence HR efficiency (34, 35). Therefore, we reexamined RPS-associated CNVs in the context of TP53 mutation status. The average RPS was not significantly different between the 238 TP53 wild-type cell lines and the 386 TP53 mutant cell lines (0.25 versus 0.41, P = 0.41). Also, the association between increased CNVs and low RPS was observed in both TP53 wild-type and mutant cell line groups. The magnitude of RPS dependence was less pronounced in TP53 mutant cells because of a high background of deletions in TP53 mutant cells. A high deletion frequency is not surprising in TP53 mutant cells because deletions are known to occur 40 to 300 times more after TP53 inactivation (36). These data suggest, therefore, that TP53 mutation status and RPS offer independent diagnostic information regarding genomic instability.

A possible relationship between RPS and TP53 status was further studied by examining resistance to DNA damage. The ability of RPS to associate with topotecan sensitivity on logistic regression was similar in both TP53 wild-type and mutant cell line subgroups (wild type, P = 0.002; mutant, P = 0.0009). This association supports the role of RPS as an indicator of HR proficiency, which is distinct from TP53-dependent activities like apoptotic threshold modulation and cell cycle regulation.

Human tumors with low RPS exhibit unfavorable clinical characteristics and elevated genomic instability

The RPS system was clinically validated using tumor data sets from the Cancer Genome Atlas (CGA). Breast and non–small cell lung cancer (NSCLC) tumor types were selected for this analysis because these data sets contained large sample sizes, annotations of clinical features, SNP array–based DNA CNV data, and adequate details on patient outcomes. Although some differences existed between different cancer types, tumors with lower RPS generally exhibited adverse clinical characteristics (Table 1). Low-RPS tumors tended to be more locally/regionally advanced and to harbor more frequent TP53 mutations. For example, the lower quartile RPS tumors were significantly more likely to have lymph node invasion in NSCLCs (P = 0.008). Similarly, breast cancers with low RPS commonly exhibited estrogen receptor loss (P = 2.4 × 10−7) and HER2 amplification (P = 0.007).

Table 1. Low RPS is associated with adverse clinical features in human tumors.

P values denote differences in frequencies among groups based on a likelihood ratio test.

View this table:

These adverse features associated with low RPS may be the result of low-fidelity repair processes, which in turn promote genomic instability and malignant progression. To explore this hypothesis, we analyzed CNV as a function of RPS using these same two CGA tumor data sets. Both carcinoma types exhibited at least one class of elevated CNV in the setting of low RPS (Fig. 5). This RPS-associated genome instability was observed in both TP53 wild-type and mutant tumors. These results suggest that mutagenic DNA repair processes dominate in low-RPS tumors, thereby promoting the evolution of malignant clinical features.

Fig. 5. Low RPS is associated with genomic instability in human tumors.

SNP array–based DNA CNVs were mined from the CGA. DNA deletions (left) and amplifications (right) were binned by size, wherein bins represent 10-fold increments in size. High- and low-RPS groups were defined as the top and bottom quartiles, respectively. (A to D) Size-based distributions of CNVs are shown for (A) TP53 wild-type NSCLC tumors (n = 27), (B) TP53 mutant NSCLC tumors (n = 50), (C) TP53 wild-type breast tumors (n = 90), and (D) TP53 mutant breast tumors (n = 58). Red indicates low RPS, and blue indicates high RPS. Error bars denote SE. Asterisks denote significant differences based on Student’s t test.

RPS is prognostic and correlates with treatment sensitivity in clinical tumors

Next, we evaluated whether RPS is linked to clinical outcomes in human tumors. NSCLC was considered an appealing tumor type for this analysis because NSCLC-directed chemotherapy regimens are generally platinum-based and because lung cancer is a leading cause of cancer mortality. We also sought to distinguish the prognostic and therapy-predictive utilities of RPS. Specifically, we hypothesized that low RPS would confer a poor prognosis because of elevated mutagenesis and associated adverse tumor features. However, we also hypothesized that sensitivity to platinum-based chemotherapeutic agents is expected to simultaneously render low-RPS tumors treatment-sensitive, given that HR-defective cells are hypersensitive to DNA cross-linkers. These two effects were predicted to counteract one another in low-RPS tumors treated with chemotherapy.

The power of RPS to characterize outcomes in NSCLC patients was investigated using data from the JBR.10 clinical trial, which had previously demonstrated a benefit to adjuvant chemotherapy in early-stage NSCLC (37). Specifically, JBR.10 had randomly assigned patients to receive cisplatin + vinorelbine chemotherapies versus no further treatment, following the resection of stage I to II NSCLCs. This data set was ideal for our analysis because of its prospective randomized trial design, combined with uniform treatment details. Hence, it does not suffer from the biases intrinsic to retrospectively collected data sets. In patients whose treatment consisted of surgery only, low RPS was associated with inferior 5-year overall survival relative to higher RPS (15% versus 60%; P = 0.004, log-rank test). This clinically validates the prognostic power of RPS (Fig. 6A). Chemotherapy significantly improved 5-year overall survival in low-RPS tumors (from 15 to 77%, P = 0.01) but not in high-RPS tumors (60% versus 72%, P = 0.55). This clinically validates the ability of RPS to determine sensitivity to platinum-based chemotherapy.

Fig. 6. RPS is prognostic and correlates with treatment sensitivity in clinical tumors.

(A) Kaplan-Meier survival curves are shown for NSCLC patients treated on the JBR.10 trial with either surgery alone (S, n = 40) or surgery followed by chemotherapy (S+C, n = 50). Low- and high-RPS groups were defined as the bottom 25th percentile and the remaining upper 75th percentile, respectively. (B) Four clinical data sets of NSCLC were analyzed for prognostic impact of RPS on survival, using multivariate analyses that controlled for overall stage. Points in the Forest plot represent treatment-specific hazard ratios of RPS (as a continuous variable). Boxes denote hazard ratio, and diamonds denote modeled hazard ratio values that summarize the combined impact of all four data sets. Error bars denote 95% CIs. Black, surgery alone; green, surgery + chemotherapy.

These data suggest that the poor prognoses associated with low RPS might be negated by chemotherapy because low-RPS tumors are especially sensitive to platinum-based chemotherapy. In the JBR.10 trial, for example, patients treated with chemotherapy had similar 5-year overall survival rates regardless of low versus higher RPS (77% versus 72%, P = 0.70). To study this further, we selected three additional data sets containing retrospectively collected data on NSCLC patients (3840). After controlling for stage on multivariate analysis, low RPS was again associated with poor survival in patients treated with surgery alone (Fig. 6B). Specifically, we combined data from all four data sets using previously described methodology (41) and found that low RPS confers a continuous hazard ratio of 1.24 [95% confidence interval (CI), 1.12 to 1.36]. When this analysis was repeated on patients treated with surgery plus adjuvant chemotherapy, the poor prognosis associated with low RPS was diminished (hazard ratio, 0.94; 95% CI, 0.69 to 1.21). Together, these findings support the hypothesis that patients with low-RPS tumors have adverse underlying prognoses, but that HR suppression and associated sensitivity to platinum-based drugs counteract these adverse prognostic features. Therefore, RPS may help oncologists select which therapies will be effective for individual patients, thereby enabling more personalized care.


The RPS is a scoring system that quantifies the expression of four genes to provide an estimate of DSB repair pathway preference. Low RPS can identify tumors that harbor HR suppression and hypersensitivity to specific classes of chemotherapeutic agents. Because this scoring system provides individualized patient information, the RPS could potentially be used to guide which classes of oncologic treatment are best suited for individual patients. The strategy used to develop this system is fundamentally different from the larger genomic characterizations of human cancers, which commonly catalog the molecular features of particular cancer types (42). The RPS also differs from gene expression signatures that derive scores from unbiased genome-wide data, in that we focused on a limited set of genes with known relevance to particular DNA repair pathways that enabled hypothesis-based analyses. An additional important difference of our study is that the information provided by the RPS applies to a broad range of carcinoma types, and it may provide similar information in other noncarcinoma malignancies as well.

Looking beyond the therapeutic implications of RPS, our results have important inferences to the basic biology of malignant tumor progression. Specifically, tumors with low RPS exhibit greater mutagenesis and adverse patient outcomes. These findings elucidate a potential pathway of carcinogenesis, in which the repression of HR efficiency fuels the evolution of genomic instability and malignant progression. This is consistent with previous studies that have observed higher expression of some DNA repair genes (including RAD51 and Ku80) in metastases compared to primary tumors (43, 44). This concept is also similar to the cancer-prone phenotypes seen in BRCA-inactivating mutations, as well as the BRCA-like phenotypes reported to occur in specific tumors like triple-negative breast cancers. A key difference, however, is that BRCA-related cancer diagnoses are relatively uncommon and distinct entities. By contrast, RPS-associated mutagenesis is a continuous effect, whereby mutagenesis gradually rises as RPS values fall. Furthermore, RPS-associated mutagenesis pertains to a broad range of cancer types. Together, these findings imply that HR suppression may play a common and central role in cancer development and malignant progression of tumors.

The overexpression of Rif1, Ku80, and PARI is expected to antagonize HR (4, 5, 9, 22). The more challenging observation, however, is the association between low HR efficiency and RAD51 overexpression. Other authors have similarly noted correlations between high expression of HR-associated genes and cisplatin sensitivity, which appears counterintuitive on the surface (45, 46). Our results suggest that cancer cells respond to HR defects by increasing the expression of a fairly broad array of HR-related genes. This concept of compensatory gene expression is not new, and RAD51 overexpression has been previously shown to functionally compensate for HR defects (23, 29). However, the large number of HR-related genes that are overexpressed in HR-deficient cells is a new observation that builds on current knowledge. It suggests the existence of a coordinated gene expression mechanism, which extends beyond the known promoter elements that are shared by BRCA and RAD51 genes (47). A recent report showed that MEN1 protein can modulate HR by stimulating the transcription of several HR genes, including BRCA1, RAD51, and RAD51AP1 (30). However, MEN1 activities are unlikely to explain our results, because MEN1 mRNA levels did not significantly correlate with RPS in CCLE cell lines.

Alternative processes may also explain the association between HR protein overexpression and HR suppression/mutagenesis. RAD51 overexpression might be a direct cause of the HR suppression. Several studies have demonstrated that cells exhibit lower HR efficiency and reduced viability when RAD51 is experimentally overexpressed to very high levels (23, 48, 49). Furthermore, human cancer cell lines that overexpress RAD51 to very high levels can exhibit nuclear foci of RAD51 in the absence of exogenous DNA damage (50), and these structures are thought to represent toxic RAD51 aggregates on undamaged chromatin that can lead to genomic instability (51). High RAD51 levels may also contribute to genome instability by catalyzing the formation of DNA-RNA hybrids, whereby an RNA transcript invades a double-stranded DNA helix (52). Another possibility is that overexpressed HR proteins might promote mutagenesis by enabling homology-driven error-prone processes, like single-strand annealing, replication template switching, and nonallelic HR. For example, transient overexpression of RAD51 was shown to promote the formation of aberrant homology-mediated repair products, involving gene conversion events that lead to chromosomal translocations (53). Likewise, RAD52 protein catalyzes the annealing of homology-containing oligonucleotides biochemically (54), and RAD52 has been shown to promote DSB rejoining via single-strand annealing in human cells (9). In another example, the HR-promoting protein RAD51AP1 was found to be a key component of an expression signature associated with chromosomal instability (55). Therefore, HR protein overexpression may directly stimulate homology-mediated events that contribute to mutagenesis in low-RPS tumor cells.

The four-gene RPS system was more powerful than the larger combinations of DNA repair gene expression values that we tested. Although this simplicity of the RPS is appealing, the mechanisms governing DSB repair pathways are complex. Indeed, the four RPS-defining proteins are major determinants of pathway choice. However, there is an ever-growing list of other proteins that play important roles in pathway choice, including BRCA1, 53BP1, and CtIP. Furthermore, at least some of these DNA damage response proteins can undergo damage-induced posttranslational modifications by sumoylation, ubiquitylation, phosphorylation, and proteasome-mediated degradation (6, 56, 57). These processes cannot be effectively measured with mRNA expression levels alone, as was used to calculate RPS. Despite these potential limitations, however, the four mRNA–based RPS system does successfully provide diagnostic data. It is also noteworthy that specific patterns of genomic instability have been correlated with HR proficiency in human cells (5860). Therefore, measurements of these specific CNV patterns from tumors may eventually provide diagnostic information that is similar and/or complementary to RPS results.

The hypothesis-driven genomics methodology used to develop the RPS system is conceptually similar to that used to develop the DNA repair pathway–focused score (DRPFS); however, several differences should be noted. The DRPFS was derived using expression levels of 151 DNA repair genes, which were quantified from ovarian tumors (46). The final DRPFS algorithm was based on 23 genes that have known relevance to the repair of platinum-induced DNA damage, and none of these 23 genes overlapped with the four RPS-defining genes. High DRPFS did significantly correlate with favorable outcomes in ovarian cancer patients treated with platinum-based chemotherapy; however, it also correlated with favorable outcomes in patients who did not receive platinum-based chemotherapy (albeit to a lesser degree). This suggests that the DRPFS and RPS scoring systems likely provide different and possibly complementary information.

In conclusion, these results have broad implications for cancer biology and oncologic patient care. These results suggest that HR suppression plays a common and central role in malignant tumor progression. Second, RPS correlates with tumor sensitivity to specific classes of therapy, so this system may enable a transition toward personalized oncology care. RPS can identify tumors that harbor HR suppression and hypersensitivity to certain chemotherapeutic classes, like interstrand DNA cross-linkers, topoisomerase inhibitors, and PARP inhibitors. Finally, RPS might be used to select appropriate candidates for other investigational drugs that directly target individual components of the DNA repair machinery [such drugs are reviewed in (61)].


Study design

We sought to create a method that estimates the efficiency of HR repair using publicly available data on human cancer cell lines. Specifically, we developed a scoring system that correlates gene expression patterns with HR proficiency. Data for mRNA expression, CNV, and drug sensitivity for human carcinoma cell lines (n = 634) were collected from the Broad-Novartis CCLE. Robust multiarray average–normalized mRNA expression values were normalized to the median value across all carcinoma samples and subsequently log2-transformed. SNP array–based DNA copy number values were filtered to eliminate individual SNPs. For CNV analysis, minimum deletion size was defined as copy number segment mean ≤−0.6, whereas minimum insertion size was defined as copy number segment mean ≥+1.4 (log2 [copy number/2]). Deletions and insertions were binned by size, whereby bins represent 10-fold increments in size. Drug sensitivities for topotecan (n = 279), irinotecan (n = 180), and paclitaxel (n = 257) were determined by IC50 (median inhibitory concentration) values. IC50 values ≥8 μM were outliers and, therefore, censored from the analysis. TP53 mutation status was determined by hybrid capture sequencing data, which was available for all carcinoma cell lines. In the six cell lines used for HR reporter experiments (RKO, DU 145, COLO 205, PC3, HCC44, and NCI-H650), sensitivities to topotecan and paclitaxel were confirmed in our laboratory using an acute continuous 3-day exposure of cells to drugs; this method is identical to the method that was used to generate the CCLE drug sensitivity data.

Quantification of HR efficiency in cells

An HR reporter–containing plasmid (pDR-GFP), an I-Sce I–expressing plasmid (pCβASce), and an empty vector control plasmid (pCAG) were provided by M. Jasin. Cells were transiently cotransfected with combinations of either pDR-GFP + pCβASce or pDR-GFP + pCAG. To accomplish this, 0.5 × 106 cells at 80% confluence were electroporated with 15 μg of each plasmid in 4-mm cuvettes, using the following settings: 325 to 375 V, 975 μF. Electroporation voltages were optimized to minimize differences in transfection efficiencies between the six cell lines. Cells were transferred into the appropriate complete growth medium and allowed to grow for 48 hours, following which they were analyzed with a Becton-Dickinson FACScan. Live cells were collected on the basis of size/complexity and 7-aminoactinomycin D exclusion. The fraction of live cells exhibiting GFP positivity was quantified. To account for any remaining differences that persisted in transfection efficiencies between cell lines, the GFP positivity resulting from pDR-GFP + pCβASce transfection was normalized to GFP positivity resulting from pDR-GFP + pCAG transfection. Experiments were performed in triplicate, and the displayed error bars denote SE.

Evaluation of RPS in human tumor data sets and association with clinical characteristics

Breast and NSCLC tumor data sets were collected from the CGA. Stage IV patients with metastatic disease or those patients without a specified stage were excluded from analysis. TP53 mutation status was determined by SNP array–based DNA copy number data. Normalized mRNA expression, CNV, and TP53 mutation status were available for 295 breast cancers and 153 NSCLCs. CNV analysis was performed as described for the CCLE carcinoma cell lines. Clinical characteristics and prognostic factors were available for 280 breast cancers and 145 NSCLCs with available mRNA expression data.

Validation of the RPS system using clinical databases

Four publicly available NSCLC data sets were collected from Gene Expression Omnibus [accession numbers GSE14814 (JBR.10 trial), GSE31210 (Japanese National Cancer Center Research Institute), and GSE42127 (MD Anderson Cancer Center)] and from the National Cancer Institute caArray Web site at (Director’s Challenge Consortium). mRNA expression values were normalized to the median value across all patient samples within each respective data set and subsequently log2-transformed. Patient samples were grouped on the basis of type of treatment. In total, 581 patients underwent surgery alone and 164 patients received surgery + chemotherapy. Cox proportional hazard analysis for overall survival was used to determine the hazard ratio for the RPS as a continuous variable. All NSCLC data set analyses were limited to stage I and II patients.

Statistical analysis

All analyses were performed with JMP 9.0 (SAS Institute Inc.). Students’ t tests were used to calculate differences in drug sensitivity or CNV between groups of cell lines or tumor samples. Log-rank tests or Cox proportional hazard models were used to assess differences in overall survival between patient groups. A P value of ≤0.05 was considered statistically significant.


Fig. S1. Most of the drug sensitivity values mined from the CCLE database were generated in the same cell lines.

Fig. S2. A representative panel of cancer cell lines exhibit expected levels of drug resistance.

Fig. S3. Measurements of mRNA by real-time qRT-PCR for six representative cell lines generated RPS values that were comparable to RPS values calculated from array-based mRNA levels reported in the CCLE database.

Table S1. DNA repair genes that significantly associate with topotecan sensitivity.

Table S2. Correlation coefficients resulting from combinations of genes.


  1. Acknowledgments: We are deeply indebted to the curators of publicly available data sets, including the CGA, CCLE, and NSCLC tumor data sets. We thank D. Bishop and J. Mason for critical reviews of the manuscript. Funding: This work was supported by funding from the NIH (CA142642-02 2010-2015 to P.P.C.), Ludwig Foundation for Cancer Research (to R.R.W.), and the Lung Cancer Research Foundation (to R.R.W.). Author contributions: Most of the analyses were designed and performed by S.P.P. and P.P.C. For characterization of the six representative cell lines, H.L.L. measured the HR efficiencies, T.E.D. measured the mRNA levels, and B.B. is responsible for the determination of drug sensitivities. The manuscript was drafted by P.P.C. and revised by S.P.P., I.M.P., and R.R.W. Competing interests: The University of Chicago has applied for patent protection (U.S. Provisional Application No. 61/881,331) of the RPS system, on which P.P.C., S.P.P., and R.R.W. are listed as inventors. R.R.W. has an advisory relationship with Bristol-Myers Squibb.

Correction:The original heading on page 4, "Cell lines with high RPS have elevated genomic instability,' was incorrect. The word “high” should have been “low.” This has been corrected

View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article