ReviewDrug Discovery

High-Throughput Methods for Combinatorial Drug Discovery

See allHide authors and affiliations

Science Translational Medicine  02 Oct 2013:
Vol. 5, Issue 205, pp. 205rv1
DOI: 10.1126/scitranslmed.3006667


A more nuanced approach to drug design is to use multiple drugs in combination to target interacting or complementary pathways. Drug combination treatments have shown higher efficacy, fewer side effects, and less toxicity compared to single-drug treatment. In this Review, we focus on the use of high-throughput biological measurements (genetics, transcripts, and chemogenetic interactions) and the computational methods they necessitate to further combinatorial drug design (CDD). We highlight the state-of-the-art analytical methods, including network analysis, integrative informatics, and dynamic molecular modeling, that have been used successfully in CDD. Finally, we present an exhaustive list of the publicly available data and methodological resources available to the community. Such next-generation technologies that enable the measurement of millions of cellular data points simultaneously may usher in a new paradigm in drug discovery, where medicine is viewed as a system of interacting genes and pathways rather than the result of an individual protein or gene.


Drug development has stagnated over the past 20 years as new drug approvals lag behind, ballooning research and development costs (1). An increasing proportion of experimental drugs are failing preclinical and clinical trials, with the two largest contributors to these growing attrition rates being efficacy and safety—each accounting for about 30% of drug failures (2). Several factors contribute to efficacy challenges in drug development. Foremost, it is increasingly apparent that the traditional approach of “one drug–one target” is no longer effective (3). This is especially true as more emphasis is placed on treating multifactorial diseases, such as diabetes, cardiovascular disease, cancer, and depression. In addition, efficacy may be limited by underestimating the extent of physiological redundancies in biological networks. When a protein is pharmacologically removed from a network, backup programs may be activated to recover function nullifying the drug’s effect (4). Finally, traditionally robust experimental model systems, such as animal models, have had notoriously poor performance in drug development (57).

Unfortunately, even if progress is made to increase efficacy, anticipating a drug’s safety profile remains an independent challenge owing to multifunctionality and drug pleiotropy. Many drug targets play functional roles in biological processes outside the scope of the drug’s intended effects, thus contributing to unintended toxicities. In addition, despite the intention of hitting just “one target,” many drugs exhibit pleiotropic effects by interacting with “off-target” proteins. So, even if a desirable drug candidate is identified against an important target, the translation of this finding into clinical practice is limited because of the emergence of unexpected effects. With some notable exceptions (that is, the hERG potassium channel for cardiac arrhythmias), few experimental biochemical assays are used routinely for preclinical evaluation of adverse drug events, because the in vitro assay cannot faithfully reproduce the system. In response, computational approaches have been developed to use network analysis and chemical systems biology data to identify off-targets and explain side effects (811). However, few of these approaches have been vetted in a prospective clinical trial setting and therefore are not routinely used in preclinical drug development. The result is that many drugs fail safety trials in the premarketing phase, and some of those that pass are withdrawn after adverse events become apparent in postmarketing surveillance (12, 13).

Combination drug treatments have been used for over 30 years as a way to increase efficacy and reduce toxicity (14). Successful development of combinatorial treatments—what we refer to as combinatorial drug design (CDD)—requires both a systems-level understanding of the targeted biological functions and a quantifiable outcome (15). As such, until recently, combination therapies have been relegated to diseases with easily quantified phenotypes (Fig. 1). High-dimensional data capture technologies, such as genome and transcriptome sequencing, expression and protein microarrays, and drug screening, enable the measure of millions (or billions) of simultaneous data points and have propelled forward our understanding of systems biology and systems medicine. Such approaches have already led to advances in our understanding of some notoriously complex diseases, including autism (16), schizophrenia (17), type 2 diabetes (18), and cardiovascular disease (19), as well as the identification of new uses for old drugs (20, 21). At the same time, these techniques enable quantification of multifactorial diseases. This allows for the development and evaluation of drug combinations for established and emerging disease domains. In this Review, we discuss how, by harnessing network analysis, machine learning, and dynamical modeling, massive and diverse data can be used for drug development. In addition, we include a listing of publicly available data (Table 1) and computational resources (Table 2) as a reference guide for future combination drug research.

Fig. 1 FDA-approved drug combinations by therapeutic indication.

Numbers indicate the percentage of unique pairs of drugs approved by the FDA as of September 2009 to treat the specified indication area. “Other” includes combinations that are used to treat gastrointestinal disorders and transplant rejection and used in prophylaxis and sexual health.

Table 1 Publicly available data resources to support CDD.
View this table:
Table 2 Analytical resources to support CDD.
View this table:

The emergence of systems medicine

Pharmaceutically driven medical care has led to lower infant mortality, longer life spans, and an overall greater quality of life. Largely because of these successes, diseases of progressed age, such as cancer, diabetes and metabolic syndrome, cardiovascular disease, and neurodegenerative disorders, have emerged as some of the most important health concerns worldwide. These diseases are multifactorial in both their presentation and their etiology. In appreciation for this complexity, new classifications of disease are emerging. Take, for example, the recent reclassification of a range of pervasive developmental disorders under the umbrella of autism spectrum disorder.

Mirroring shifting phenotypic definitions, computational algorithms and sequencing data are redefining our molecular understanding of disease. In autism, concepts from graph theory were used to develop a probabilistic genetic network where the edges represented the likelihood of two genes sharing a genetic phenotype (16). When combined with copy number variant (CNV) data generated from next-generation sequencing, this graphical model revealed a novel and cohesive gene cluster that supports a role of synaptogenosis in autism. In a follow-up work, this approach was extended to explain how genetic perturbations of a common molecular network involved in both autism and schizophrenia can have very different functional consequences (17). The shift in focus from the study of individual proteins to the systems level has led to advances in other areas as well. For example, a genome-scale integrative systems biology approach revealed new molecular players in the development of coronary atherosclerosis (22) and identified new targets in the treatment of in-stent restenosis (23). Another approach combined network analysis with a model of disease susceptibility to discover and validate three previously unknown obesity genes (24).

The rise of systems medicine has brought an appreciation for the robustness of pathological networks as well as the dismissal of the one drug–one target paradigm of drug development. In addition, it has highlighted the value of massively high-throughput experimental techniques that capture, in a parallelized fashion, millions of data points simultaneously. Only through appropriately leveraging integrative informatics approaches can these large data sets be used to glean biomedical insight. It is not enough to simply map out all possible pairwise genetic interactions. To effectively unravel higher-order (emergent) effects, one must understand the functional relationship between genes, pathways, and systems. In the area of CDD, researchers have focused on developing models that identify points in a genetic interaction network that represent nonlinear dependencies. The motivating hypothesis is that drugs that target these dependencies will have combinatorial therapeutic consequences that are superior to single-drug therapies.

What makes a good combination therapy?

There are no strict requirements or criteria that define effective combination therapies. Roughly speaking, any combination that produces an effect that is more beneficial when given together rather than individually is a good combination therapy. Note that this definition may include both synergy—where the combined effect on a target is greater than the sum of the individual effects—and antagonism—where the combined effect is less. The latter can be especially useful when attempting to ameliorate off-target adverse drug events. It is also important to note that some drug combinations have also been shown to be associated with increased risk of adverse events (25). It is useful, therefore, to begin our discussion with traditional models of synergy. These effective methods are simple and serve as an excellent starting point when designing high-throughput experimental and computational CDD techniques.

Synergistic action may arise out of complementary drug actions, anti-counteractive actions, or facilitation actions (4). Complementary drug actions occur when two (or more) drugs target multiple points along the same protein or pathway (Fig. 2A). Many cancer drug combinations work in this way by targeting different points along the same apoptotic pathway (for example, aplidin and cytarabine, paclitaxel and tubacin) (4). Anti-counteractive actions refer to when a secondary drug targets a biological response to the primary drug (Fig. 2B). For example, epigallocatechin gallate kills bacteria by disrupting the integrity of their cellular wall, whereas d,l-cycloserine prohibits repair by inhibiting cell wall synthesis. Finally, facilitating actions are combinations where a secondary drug promotes the activity of the primary therapeutic. For example, candesartan targets the angiotensin II receptor to treat blood pressure. Ramipril inhibits the formation of the endogenous ligand (angiotensin II) and thus, through reduced competition, increases the primary drug’s efficacy profile (Fig. 2C) (4). One hundred seventeen such drug combinations were explored and categorized by their mechanism of action in a comprehensive review by Jia et al. (4). Pharmacological synergy is essential because it allows for a statistically significant increase in therapeutic benefit in exchange for a modest increase in potential toxicity. It is this favorable tradeoff that makes combinatorial drug treatment so desirable. Systematic exploration of potentially therapeutic drug combinations therefore relies on efficient and accurate quantitative models of synergy.

Fig. 2 Mechanisms of drug combinations.

(A) Complementary action. Aplidin activates the FAS-receptor–mediated extrinsic apoptosis by promoting FAS receptor. Cytarabine binds to DNA, inhibiting RNA synthesis and DNA repair, which leads to increased cellular stress and apoptosis. (B) Anti-counteractive action. Epigallocatechin gallate binds to the peptidoglycan layer and damages the bacterial cell wall. d,l-Cycloserine inhibits cell wall repair function, leading to cell death. (C) Facilitating action. Candesartan-cilexetil inhibits angiotensin receptor through competing with angiotensin II. Angiotensin II production is inhibited by ramipril.

There are several statistical methods for detecting drug synergies in experimental data, with the most common being Loewe additivity and Bliss independence. In Loewe additivity, the drugs are assumed to act on a protein target through a similar mechanism of action (26). The ratios of the theoretical combined concentration to the individual concentrations are then assumed to be additive and equal to 1. A value less than 1 indicates synergy. This approach has given rise to the combination index, an approach popularized by Chou and Talalay for enzyme inhibitor studies (27). Founded in probability theory, Bliss independence assumes that the drugs are acting by independent mechanisms and models their combined effect as the product of the individual effects (28). An effect observed beyond this estimate would be considered synergistic. An excellent discussion of pharmacological synergy, as well as the mathematical formulation of these models, is available in (29).

High-dimensional experimental discovery of combination therapies

Traditional techniques for discovering drug synergies analyze a series of experimentally derived dose-response curves for each pair of drugs against each protein target, functional pathway, or organism. A systematic search through all combinations of just the Food and Drug Administration (FDA)–approved compounds (~1500) would require more than 1 million experiments. In the development of a high-throughput system for evaluation of combination effects, Borisy and colleagues (30) pooled known inactive compounds together, thus ignoring a large portion of the drug pairs, because it is uncommon that two individually inactive compounds would show activity when combined. Pools with activity were then deconvoluted to identify the active pair (30). With this method, more than 120,000 combinations of 500 drugs were evaluated. This led to the identification of new fungicidal and antitumor drug combinations, which were validated by cellular assays.

However, to achieve comprehensive coverage of all drugs (about 2.3 million potential combinations), another order of magnitude increase in capacity would be required. Using a heuristic experimental design approach with adaptive pooling, called multiplex screening for interacting compounds (MuSIC), Tan et al. (31) assayed more than 500,000 pairwise combinations of 1000 drugs using less than 3% of the wells that would have been needed in a standard screen. This approach cannot determine synergy on its own, however, and follow-up experiments were conducted to verify hits from the high-throughput screen. Applied to the identification of HIV drugs, MuSIC found several promising combinations (31). Hierarchical and high-throughput experimental designs such as this enable the evaluation of all possible combinations of FDA-approved drugs to be evaluated for a single disease.

The value of these experimental approaches is unquestionable. However, there are hundreds of potential diseases to explore, and the space of bioactive small molecules contains hundreds of thousands of structures. Further, a computational analysis of drug combinations suggests that as many as nine drugs may be required to achieve the desired efficacies for a single disease (32). Computational algorithms that leverage genetic interaction network structure, disease-specific metabolic profiles, and chemical informatics techniques and resources have potential to reduce this search space and efficiently comb through billions of possible drug combinations.

Computational discovery of combination therapies

Methods based on empirical models. Many of the advances in systems biology have resulted from the application of graph theory to networks generated from protein-protein interactions, coexpression networks, functional modules, genetic variation, and literature mining. It is no surprise then that many of these approaches have been among the first adapted for drug development. Here, we highlight state-of-the-art work that leverages computational and mathematical tools (Table 2) for CDD. A central concept in these network-based methods is the “gene neighborhood.” Often these neighborhoods represent known disease modules derived from integrating pathway resources [for example, Kyoto Encyclopedia of Genes and Genomes, Reactome (KEGG)] and the literature (33). With topological features of such a network, enrichments in the connectivity between the targets of pairs of drugs have been identified (33). For example, the authors were able to explain the combination effects between 5-fluorouracil and vinblastine from their connectivity with multiple kinase pathways involved in cell proliferation and apoptosis (33). Li and colleagues used their approach to study combinations of chemical agents and traditional Chinese herbs. Chemical combinations with sinomenine, an antiangiogenic extract from the Sinomenium acutum herb, were experimentally validated. Other approaches have used a simple approach inspired from genomic analysis to produce new drug combination hypotheses. For example, two groups independently used genetically engineered mouse models and systems biology methods to identify combination therapies that synergistically inhibit cancer cell growth: in one case, identifying a new treatment for melanoma, and in the other, for prostate cancer (34, 35).

Drug interaction and chemogenomics (that is, compound-protein interactions) databases are a complementary resource to support combination drug discovery. By constructing a drug combination network that integrated drug-target and drug-drug interactions in addition to protein-protein interactions, Zou et al. (36) were able to identify pharmacological neighborhoods. The connectivity of these neighborhoods revealed hotspots of potential pharmacological synergy (36). A machine learning algorithm trained with topological features of these networks identified known drug combinations from a null set of drug pairs with a precision of 90% and a recall of 60% (36). In a similar work, Xu et al. (37) constructed an integrative “drug cocktail network” and used machine learning to predict therapeutic combinations, validating retrospectively against a set of known drug combinations. They achieved similar performance (area under the receiver operating characteristic curve of 0.92). These purely machine learning–based approaches, however, are limited in their ability to provide a mechanistic understanding of the identified drug synergy. The features and parameters learned by the model may represent high-order interactions or arbitrary selections between correlated features, both of which are difficult to decipher. Further, many of these approaches rely on retrospective methods of evaluation. Prospective validation is required if these computational approaches are to become truly translational.

Synthetic lethality networks represent known biological dependencies (associated with growth or apoptosis) between pairs of genes. In these networks, two genes are connected if a double-knockout model shows depleted growth above and beyond an individual knockout model. Using transcription data and literature mining, Söllner et al. (38) identified 170 genes (in 14 distinct pathways) affected by mycophenolate mofetil (MMF) exposure. MMF is an immunosuppressant used in transplant rejection. By mapping known yeast synthetic lethal relationships to their human homologs, they identified combinations of drugs that interfere with calcium-based regulation, including adenosine deaminase inhibitors and sulfasalazine (37). Similarly, high-throughput screens of drug-dependent synthetic growth defects or lethality can reveal pharmacologically sensitive points. Chemogenetic profiling from yeast identified and exploited where these sensitivities can lead to synergistic drug action (39). Their approach recaptured many known combination antifungal therapies and identified novel combinations, which they validated experimentally (39). Perhaps the most important feature of these approaches is that, unlike the machine learning–based methods, these also present an easily interpreted mechanistic explanation of the predicted synergy. Each drug combination prediction can be traced back to the genetics that supports it. This is important for understanding the biological context of the therapeutics as well as what adverse effects the therapy might have.

Many of these computational approaches are holistic approaches that do not consider a particular disease context. Integrating knowledge (in terms of genetic pathways or established targets) of diseases can greatly improve both the prediction accuracy of computational drug design methods and the interpretability of those predictions. Wu and colleagues (40) explored protein-protein, protein-DNA, and signaling pathways in the context of type 2 diabetes. Using a method inspired by mathematical modeling, they were able to successfully rediscover the therapeutic combination of rosiglitazone and metformin (sold as AvandaMet). Incorporating context into computation methods requires knowledge-based engineering and the use of semantic networks. Computable knowledge, in the form of biomedical ontologies, will become increasingly important as a way to organize as our ability to generate high-throughput data increases.

Network-based algorithms for drug combination discovery, such as those we present in this section, are effective at identifying first-order interactions and relationships. However, their ability to capture and leverage important nonlinear dynamics is limited. These dynamics can have a major impact on pharmacology. For example, a time dependency between coadministration of chemotherapy and epidermal growth factor receptor inhibition was recently discovered (41). The systems genomics analysis used in that study revealed that the cell is rewired to become more susceptible to treatment. Further, advances in mathematical modeling strategies have enabled accurate prediction of complex and emergent cellular phenotypes. When combined with enormous computing power, improvements in modeling efficiency and design can lead to breakthroughs in our understanding of the biochemical system, for example, the recent publication of the first full-scale mathematical model of the cell (42).

Methods based on physiological models. In the case where the mechanism of a process is well understood, mathematical modeling is a powerful tool for study and exploration. Although mathematical modeling has been most successful in the fields of physics and fluid dynamics, there have been some storied successes in biology as well, with Hodgkin and Huxley’s model of neuronal action potential being the most famous. Perhaps inspired by these pioneers, Bagheri and colleagues (43) used an ordinary differential equation (ODE) model of mitogen-activated or extracellular signal–regulated protein kinase kinase (MEK) inhibition to identify therapeutic drug combinations for cancer treatment. The most valuable feature of this type of modeling is the ability to “peak under the hood” and explore the dynamics of the modeled terms. Using this strategy, the authors were able to confirm that MEK inhibitors functioned independently of an established oncolytic pathway (43). Unfortunately, when applied to large systems with many proteins and molecules, ODE models can quickly become computationally unmanageable.

Alternative modeling strategies that limit the scope of which dynamics are modeled are more efficient and more scalable. These methods use huge experimental data sets to inform and constrain omic-scale models of cellular function. One such constraint-based approach commonly used to model large metabolic systems is flux balance analysis (FBA) (Table 2). FBA uses strong assumptions to drastically reduce mathematical complexity. Assuming a steady state of metabolite and protein concentrations, FBA attempts to maximize (or minimize) a given set of target functions (for example, minimizing glucose uptake and/or oxygen consumption). FBA has been used to identify drug combinations to treat metabolic disorders (diabetes, obesity, hypertension) and cancers (44). The most distinct feature of this work was that the authors were able to quantify not only the potential therapeutic benefit but also the potential for adverse side effects. By comparing the effects on models of cancer cells and models of normal cells, they could predict a drug combination’s selectivity (44).

Alternative strategies to increase efficiency shift the focus from modeling the molecules themselves to modeling a process that operates on the molecules. Petri nets (PN) are state-transition models whose formulations are relatively simple but are powerful tools for synthesizing large amounts of data (Table 2). Jin et al. used a PN model to first learn the individual effects of drugs on gene expression, and second, computationally combine these effects to identify potential drug synergies (45). This hybrid approach combines the power of a mathematical model with the efficiency of a network analysis.

In addition to being better able to capture nonlinear dynamics, mechanistic models allow for quantitative predictions. Whereas a network analysis could only predict putative synergistic drug pairs, these modeling approaches attempt to predict the amount of synergism and the quantitative effect on certain functional pathways. This quantitative nature will facilitate experimental validation because it provides a method of ordering predictions by those deemed to be most likely to be true.

Challenges facing CDD

Massively large and complex biological data have only begun to be used for designing new combination drug therapies, and although there is incredible potential, there exist many significant challenges. CDD is the ultimate test of our understanding of—and therefore our ability to manipulate—systems biology. Considering that effective combination therapies may require the interaction effects of many drugs, it may never be feasible to comprehensively explore the entire pharmacological space experimentally. Therefore, in silico models must take the place of microplates as the first-line source of exploration and knowledge generation.

Here, we present many innovative methods for the computational prediction of therapeutic drug synergies; however, very few mention the problem of adverse side effects, and fewer still present strategies for predicting them. Predicting drug side effects is significantly more complex because the outcome phenotype is unknown. For example, the commonly used terminology for side effects, MedDRA, contains nearly 20,000 mid-level terms describing potential adverse events (and more than 70,000 low-level terms). This difficulty is one of the primary reasons the FDA has relied on retrospective analysis of surveillance databases to monitor drug safety in the context of drug interactions (13, 46). Complicating the matter, drugs often cause their side effects through unknown off-target interactions or are mediated by pharmacogenetic variants. Chemogenetic resources, such as PubChem and ChEMBL, provide binding and functional assay data for hundreds of thousands of compound-protein pairs (Table 1). Genetic knowledge bases, such as PharmGKB (47), provide curated data on the genetic variation of drug response (Table 1). However, these data sets are still very sparse and largely incomplete. Effective prediction of side effects of drug combinations will depend on advances in our ability to experimentally assay a drug’s individual effects, characterize the genetic modulators, model the interactions, and finally predict potential side effects de novo.

Future opportunities

The ubiquity of high-throughput data capture technologies has led to an explosion in the number of experimental data sets publicly available. For example, the Gene Expression Omnibus now contains nearly 1 million expression experiments, and the Encyclopedia of DNA Elements (ENCODE) contains more than 2500 high-dimensional experiments covering everything from DNA methylation and chromatin modification to RNA expression and transcription factor binding. Parallel to the growth in biological data, health care reform has spurred the widespread adoption of electronic heath record systems that capture hundreds of thousands of clinical measurements routinely collected in medical practice. These data, however, are under tight regulation, which often restricts sharing and precludes analysis by outside institutions. If these challenges can be addressed, health care data may represent a deep and underused resource describing population variation (both clinically and biologically) that could help to explain differential drug response and inform CDD.

Large repositories of diverse data, however, are often difficult to navigate and challenging to synthesize. An opportunity exists for novel informatics methodologies to intelligently and efficiently integrate these resources to support computational and statistical data mining. These new methods must not only efficiently integrate the data but also have the capability to assess the reliability and validity of the data. This “smart” use of knowledge bases combined with information technology is exactly what makes next-generation informatics tools, such as IBM’s Watson, so captivating. Synthesized versions of these resources will enable the application of more sophisticated modeling strategies that incorporate biochemical and systems-level dynamics. Although these methods will be more costly in terms of computation, they will also have the potential to be much more predictive of effective drug combinations.

Nascent models of disease pharmacology [for example, Tyson’s model of breast cancer (48)] are the first step at capturing the biochemical dynamics of high-dimensional and nonlinear systems. These methods have been continually limited because of a lack of experimental data to support parameter fitting and optimization. The massively high-dimensional biological and medical data now being generated represent a new opportunity to address this problem. The key in unlocking this potential lies in establishing a connection between the data captured and the model parameters—an exciting and challenging research endeavor.


Drug combination therapies have been a staple of modern medicine for the treatment of cancer and infection for over 30 years. They increase efficacy by targeting redundant biological systems and ameliorate toxicity through increased selectivity and reduced dosage requirements. Unfortunately, translational chemical biology—the process of identifying disease driver genes as drug targets and designing a small-molecule therapeutic—is a major research challenge. It remains difficult to anticipate what will happen to a patient when taking multiple drugs together. This is evident by the relatively high rate of unexpected adverse drug-drug interactions (49). The human system is a complex network of interaction pathways resulting in dynamic emergent behavior. New high-throughput data capture techniques and advanced computational modeling of these networks have the potential to revolutionize our understanding of these complex systems. We have highlighted the state-of-the-art experimental techniques and analytical tools currently being applied to advance combination drug therapy. These methodologies use new technology to screen millions of potential therapeutic drug combinations. There remains a tremendous opportunity to leverage underused data resources and existing analytical techniques to break open a new avenue of drug design and, importantly, work toward closing the ever-present and growing pharmaceutical productivity gap.


  1. Acknowledgments: We thank K. Karczewski for insightful discussions. Competing interests: The authors declare that they have no competing interests.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article