Disciplined approach to drug discovery and early development

See allHide authors and affiliations

Science Translational Medicine  27 Jul 2016:
Vol. 8, Issue 349, pp. 349ps15
DOI: 10.1126/scitranslmed.aaf2608


Our modern health care system demands therapeutic interventions that improve the lives of patients. Unfortunately, decreased productivity in therapeutics research and development (R&D) has driven drug costs up while delivering insufficient value to patients. Here, I discuss a model of translational medicine that connects four components of the early R&D pipeline—causal human biology, therapeutic modality, biomarkers of target modulation, and proof-of-concept clinical trials. Whereas the individual components of this model are not new, technological advances and a disciplined approach to integrating all four areas offer hope for improving R&D productivity.

The past several decades have seen a decrease in research and development (R&D) productivity, as measured by new therapeutic drug (NTD) approvals per dollar spent (1). Although the number of drug approvals by the U.S. Food and Drug Administration (FDA) has seen an uptick since 2012 (2), R&D costs continue to rise: The median cost of developing any NTD from phase 1 clinical trials to approval is ~$250 million (3). However, only ~10% of drugs tested in phase 1 are ultimately approved, which drives the total R&D cost to well more than $2.5 billion for every NTD approved. Failure to achieve safety or efficacy in phase 2 and phase 3 trials is a major driver of cost and therefore a key contributor to the decline in R&D productivity (47). Thus, the biopharmaceutical industry must reduce attrition rates in late-stage clinical trials and deliver new therapies that differentiate from standard of care if it is to reverse the decline in R&D productivity.

Several studies offer explanations for failure or predictors of success in phase 2 and phase 3 clinical trials. Scannell and Bosley found that the poor predictive value of preclinical models correlates with lack of efficacy in phases 2 and 3 (8); others have found that drug targets based on human genetic diseases or R&D programs that make use of robust clinical biomarkers that estimate drug efficacy are more likely to achieve success (911). For small-molecule drug R&D programs, the physicochemical properties of the compounds correlate with failure in safety trials (6). Last, small clinical trials that test proof of concept (PoC) in a “fail fast” strategy reduce phase 2 and phase 3 attrition rates (12).

The fact that many drugs that gain regulatory approval do not differentiate from the standard of care exacerbates the decline in R&D productivity. Although it is difficult to quantitate the ability to deliver value in the real world, annual peak sales of NTDs is one appropriate measure (13). Using 3-year rolling averages for late-stage pipeline assets, estimates of peak sales have decreased by nearly 50% over the past 5 years, from $692 million (during the period from 2010 to 2012) to $451 million (2013 to 2015) (14).

In this Perspective, I build on previous studies and provide a framework for integrating four key components to address the decline in R&D productivity: (i) causal human biology, (ii) therapeutic modality, (iii) biomarkers of target modulation, and (iv) clinical PoC studies (Fig. 1). In isolation, each of these solutions is important but of limited value. For example, if the wrong target is selected, then it does not matter whether robust biomarkers are developed and deployed in a clinical PoC trial with a novel therapeutic modality. However, if all four features are satisfied for a given program, there will likely be an increased probability of success in delivering new and meaningful therapies to patients.

Fig. 1. A disciplined approach to identify drug targets and test therapeutic hypotheses.

Four areas in translational medicine are causal human biology, therapeutic modalities that recapitulate human biology, biomarkers of target modulation, and next-generation clinical trial technologies. Connecting all four promises to improve the novelty, efficiency, and productivity of drug R&D.



A “good” drug is one that binds to and modulates a molecular target in such a way that is safe and effective in the disease context for which it is administered. This concept of safety-efficacy profiles—which is related to the concept of drug dose-response curves—serves as an imperfect but useful paradigm to guide target identification (15). At the beginning of the R&D process, it is ideal to know whether a biological target, when perturbed, alters human physiology in a way that suggests that a cognate drug will be safe and effective in humans.

A central feature of the translational medicine model proposed here is that target modulation, as measured in humans, is causally related to a physiological outcome. There are many examples of naturally occurring biologic perturbations that lead to changes in human physiology. These “experiments of nature” provide clues into the mechanisms by which new therapies might work. Accordingly, a goal of drug R&D is to develop therapies that mimic experiments of nature to establish a causal link between target perturbation and physiological outcomes. These causal relationships should be established at the time a target is selected.

One of the simplest examples of an experiment of nature comes from infectious diseases. Numerous bacteria and viruses cause human disease—from infections of the lungs (for example, pneumonia) or skin (for example, cellulitis) to other diseases that were not considered initially to result from an infectious agent (for example, Helicobacter pylori as a cause of gastric ulcers or human papillomavirus as a cause of cervical cancer). Therapeutic interventions against these infectious agents have a documented benefit on human health, which provides a modern-day test of Koch’s postulates.

Human genetics and tissue-specific autoimmunity also constitute experiments of nature that illuminate tweaks in human biology that cause disease and allow researchers to formulate therapeutic hypotheses (Table 1). One example comes from genetic and acquired disorders of the central nervous system. Approved antipsychotic medications block dopamine receptor D2 (DRD2) and treat the positive symptoms in patients with schizophrenia. A genome-wide association study identified genetic variation in the DRD2 gene locus as being associated with schizophrenia risk (16). In patients with encephalitis and clinical symptoms related to psychosis, autoantibodies against DRD2 have been identified (17). Although these human experiments of nature were not used in the development of antipsychotic medications to treat schizophrenia, they do provide PoC that mechanistic data from human genetics and tissue-specific autoimmunity can, in retrospect, identify the in vivo targets of approved drugs.

Table 1. Natural selection.

Examples of approved drugs with causal support from tissue-specific human autoimmunity or human genetics. SOST, sclerostin; IL6R, interleukin 6 receptor; PCSK9, proprotein convertase subtilisin/kexin type 9; GLP1R, glucagon-like peptide–1 receptor; IL23A, interleukin 23α; IL12B, interleukin 12β.

View this table:

To fully capitalize on experiments of nature to identify promising drug targets, scientists require new ways to study causal human biology. Fortunately, many new technologies are emerging to study germline genetic variation and the human immune system. Advances in human genetics are identifying causal links between putative drug targets and human physiology (15). Ongoing large-scale sequencing efforts in human populations with linked clinical data suggest that it is realistic to expect that a hundred million genomes will be sequenced and available for discovery research in the next 10 years. Because these genomes are linked to detailed clinical data, genotype-phenotype dose-response curves can be estimated at the start of a drug discovery program. In an ideal situation, a promising drug target would be encoded by a gene with a series of alleles [gain of function (GoF) and loss of function (LoF), including “human knockouts”] (18) linked to clinical data that can be mined to estimate the effect of lifelong genetic perturbation on human physiology.

New single-cell technologies now make it possible to identify antigens that drive the human immune response to infectious diseases, autoimmune disorders, and other clinical phenotypes. For example, neutralizing antibodies that recognize the hemagglutinin glycoprotein antigen from influenza A virus have been identified (19). Such information is being used to develop therapies that provide passive protection against influenza infection, as well as to inform universal vaccine design. Similar technologies are being applied to identify the target of tissue-specific autoantibodies—and therefore, potential targets for new drug discovery programs, as in the case of antibodies against DRD2 in patients with schizophrenia.

Last, animal models remain a valuable tool for gaining an understanding of complex physiology, testing pharmacology, and assessing safety. According to the model proposed here, an important distinction is that animal models should not be used to pick targets at the beginning of a drug discovery program. Targets should be selected on the basis of a deep understanding of causal human biology, not on the basis of imperfect and notoriously inaccurate animal model data, whether causal or correlative.


Once a target has been identified, the next step in the drug discovery pipeline is to develop a therapeutic that modulates the target in a specific manner. This step presents two major challenges. First, the therapeutic molecule must gain access to the protein target of interest. It is estimated that only ~20% of human proteins are accessible by either small molecules (which target hydrophobic pockets) or biological therapeutics (which bind to extracellular targets), which leaves most protein targets “undruggable” (20). Second, once a therapeutic molecule engages its target, it must exert an effect consistent with the underlying therapeutic hypothesis. Both small-molecule drugs and biologics are restricted in the mechanism by which they perturb a target (for example, orthosteric inhibition for small molecules and neutralization of extracellular proteins by biologics). As a consequence, only a small portion of potential drug targets is considered therapeutically tractable for a new drug discovery program.

In the translational model proposed here, a key step is that therapeutic modulation precisely recapitulates causal human biology. To enable this step, it is critical to understand the directionality of the desired therapeutic modulation (is it therapeutically desirable to increase or decrease activity of protein target?) as well as the mechanism by which therapeutic modulation is expected to act (for example, by altering enzymatic activity, ligand-induced receptor signaling, or transcriptional regulation). As an example, if the human immune system leads to autoimmune destruction of a specific cell type that secretes a specific protein ligand (such as the obliteration of neurons that secrete wakefulness-inducing orexin in patients with narcolepsy), then the therapeutic intervention should inhibit the ligand-receptor interaction, perhaps with the use of orexin-receptor antagonists to promote sleep (21).

In the absence of a disciplined approach, targets and mechanisms of therapeutic perturbation might be selected on the basis of tractability (or druggability) rather than causal human biology. Indeed, there is a historical bias toward protein classes considered druggable, such as kinases, ion channels, G protein–coupled receptors, and extracellular cytokines and their receptors (22). In contrast, many targets identified by human genetics or other experiments of nature might not be considered druggable by either conventional small molecules or biologics.

An example of a challenging target with strong evidence of causal human biology is glucocerebrosidase (GBA), a lysosomal enzyme–encoding gene first found to be mutated in patients with Gaucher disease—a lysosomal storage disorder—and later in patients with Parkinson’s disease—a movement disorder characterized by α-synuclein aggregation in the brain (23). GBA breaks down glucocerebroside into glucose and ceramide, a fat molecule. As an intracellular protein, GBA is not accessible via conventional antibody-based biologics. To formulate a therapeutic hypothesis for perturbation of GBA with a small molecule, one needs to know the mechanism of Parkinson’s disease–associated mutations. The two predominant hypotheses are GBA enzymatic LoF, which affects α-synuclein processing and clearance, or GoF, which results in protein misfolding and α-synuclein accumulation. The implications for small-molecule modulation are substantial: One would posit increasing either GBA enzymatic activity based on an enzymatic LoF mechanism or GBA stabilization based on a GoF mechanism. Both molecular mechanisms are challenging to achieve with a small molecule, indicating that a new approach to GBA targeting is needed.

To overcome the challenges of the “undruggable genome,” new approaches are being developed to expand mechanisms by which small molecules and biologics exert therapeutic effects (for example, positive allosteric modulators or conjugated nanobodies that bind different epitopes of a single target). Moreover, phenotypic screens can be used to uncover unexpected mechanisms by which small molecules modulate a target or pathway (such as by binding to and modifying a regulatory RNA structure) (24). Further, beyond small molecules and monoclonal antibodies, new therapeutic modalities are gaining traction, such as mRNA delivery, small interfering RNA and antisense oligonucleotides, gene editing with CRISPR (clustered regularly interspaced short palindromic repeats)–Cas9, and peptides. These new modalities should expand the ability to recapitulate causal human biology in the form of a therapeutic.


One of the most difficult aspects of drug discovery is making robust predictions about how drug concentration in the blood relates to the final clinical outcome required for registration. Generally speaking, the term “biomarker” refers to biological readouts along the chain of events from the time a drug is exposed to the target (target exposure), engages with the target (target engagement), and modulates the target to exert a physiological effect in a human system (target modulation). Here, I focus on pharmacodynamic biomarkers of target modulation, because robust pharmacodynamic biomarkers of drug efficacy are positively correlated with drug approval (10).

The most valuable pharmacodynamic biomarkers are those that integrate blood and tissue pharmacokinetics (25) and target engagement into a biological readout that is feasible to measure in a clinical trial. Unfortunately, many pharmacodynamic biomarkers measure biological states that are irrelevant to human disease. That is, many pharmacodynamic biomarkers measure the impact of pharmacological perturbation on a biological system, but these measurements have no connection to disease-specific causal human biology. In the translational medicine model proposed herein, a key step is to identify biomarkers that robustly measure the same physiological outcomes induced by experiments of nature in humans. Such intermediate phenotypes provide a link between the physiology induced by drug perturbations and improvement in clinical outcomes.

For human genetic targets, it is possible to use a technique, Mendelian randomization, to establish causality between biomarkers and clinical outcomes (Fig. 2): Inherited variation in a gene of interest (target) can be tested for association with both an intermediate process (a biomarker) and a clinical outcome (such as disease risk). If there is an association with the genetic marker, biomarker, and clinical outcome, then there is a causal relationship between the marker and disease outcome. In contrast, if a variable not on the causal pathway is responsible for an epidemiological observation, then there will be no association between genetic variation and both of these observations. Note, however, that it is possible that a genetic variant influences one but not the other (such as association with a biomarker but not disease); under this scenario, there is no causal relationship among all three (target, biomarker, and disease). Because genotypes are randomly assigned at birth, much in the way that therapeutic interventions are randomly assigned at the start of a clinical trial, Mendelian randomization can be thought of as nature’s randomized controlled trial.

Fig. 2. Nature’s randomized clinical trial.

Human epidemiology is a powerful observational method that establishes an association between a risk factor and disease (such as hormone replacement therapy and cardiovascular disease). On its own, however, epidemiology cannot establish causality and is, therefore, subject to spurious associations as a result of unmeasured confounding factors. Mendelian randomization is a method that uses human genetic variation to test for a causal effect between observational data and clinical outcomes. G, target genotype; I, intermediate biomarker; D, disease outcome; C, confounder variable.


Low-density lipoprotein (LDL) cholesterol represents an example of a robust pharmacodynamic biomarker linked to causal human biology through genetic association at the PCSK9 gene. As demonstrated through Mendelian randomization, the same human PCSK9 genetic variants that give rise to lower LDL cholesterol protect from risk of cardiovascular disease, such as heart attack (26). One important reason that the FDA approved two PCSK9 inhibitors, alirocumab and evolocumab, is confidence that LDL reduction is an accurate efficacy biomarker for protection against cardiovascular events. Nonetheless, cardiovascular clinical trials are under way to test clinical benefit.

Looking ahead, new technologies should enable the development of pharmacodynamic biomarkers linked with causal human biology. In vaccine trials of infectious diseases, the same single-cell technologies used to identify target antigens can be used to monitor the human immune response to those antigens. For targets based on human genetics, human subjects can be studied to identify potential biomarkers that differ between those with and without mutations of interest. To this end, population-based resources that link genetic data to deep, longitudinal molecular profiling and clinical data are being established (such as the United States–led Precision Medicine Initiative) (27).


Clinical trials represent the ultimate test of a therapeutic hypothesis. After a drug candidate has been tested for safety and tolerability in a phase 1 clinical study, it is tested for a relationship between dose of a drug and biological activity (dose-response curves) in a phase 2 trial. This stage is followed by a larger phase 3 trial to assess the safety-efficacy profile and, therefore, the purported value of the therapeutic in clinical practice. Traditionally, each phase is conducted in series; healthy volunteers (phase 1) or patients (phases 2 and 3) are monitored directly by health care providers in a clinical unit, and outcomes are measured by laboratory tests or clinical findings that are part of routine clinical practice.

Within this traditional clinical trial framework, achieving PoC depends on the disease indication. For some disorders, such as infectious diseases, PoC can be achieved by observing viral-load reduction in very small cohorts of patients in phase 2. For other disease types, such as neurodegenerative diseases, PoC can be achieved only by observing changes in clinical outcome in phase 3 trials that involve thousands of patients. The translational model discussed herein proposes new clinical trial design approaches to link therapeutic modulation of targets anchored in causal human biology with pharmacodynamic biomarkers of target modulation. The goal is to gain confidence in the hypothesis that the therapeutic modulation exerts the desired biological effect—and ideally, to achieve PoC in the smallest number of patients possible.

First, selected patient populations can be identified for the clinical PoC study. This may occur for patients with a genetic disease, such as has been demonstrated for the drug ivacaftor in cystic fibrosis patients who carry specific genetic mutations (28). Second, pharmacodynamic biomarkers that are linked with causal human biology are measured after drug intervention. In developing an influenza vaccine, an immune response to hemagglutinin glycoprotein antigens is a robust pharmacodynamic biomarker. As described above, LDL lowering, linked with human carriers of different PCSK9 mutations, is a powerful pharmacodynamic biomarker for PCSK9 inhibitors. Third, patients can be followed outside of traditional clinical units using digital health technologies. Examples include the use of “digital pills” (such as metal-coated tablets that dissolve in the stomach and communicate wirelessly with a mobile device), continuous monitoring devices (such as glucose-sensing contact lenses), and consumer-based laboratory testing (such as smartphone kits) (29). Finally, adaptive trial designs—in which biomarker or clinical outcomes can be used to modify the design during the trial—represent a powerful approach to connect the thread of causal human biology, biomarkers, and clinical PoC. The breast cancer study I-SPY 2 is an example of an adaptive design that pairs therapies with different molecular biomarkers (30).


Although a disciplined approach to linking causal human biology, therapeutic modality, biomarkers of target modulation, and PoC clinical trials should improve R&D productivity, there are limitations to the translational medicine model proposed here. First and foremost, there is an underlying assumption that we have sufficient data from humans to enable the discovery of new therapeutic targets and biomarkers. Validation of this assumption requires an ecosystem to define which sources of human data establish causality; members of the ecosystem must then work systematically toward building such databases that are accessible to all. For example, there is no single resource that enables systematic identification of human genetic variants linked to clinical outcomes in large patient populations (>10 million people) in a setting suitable for recall. Similarly, there is no large population with detailed molecular longitudinal profiling to identify novel biomarkers. It is encouraging, however, that many efforts are under way to generate these human databases.

Second, experiments of nature are rarely perfect substitutes for pharmacological interventions. Accordingly, targets with convincing causal human biology might not lead to successful therapeutics. However, a two- to threefold increase in the success rate during phase 2 or phase 3 would have substantial financial implications. Failing in a large phase 3 study is about 10-fold more expensive than failing in a small clinical PoC study ($150 million versus $15 million per NTD) (4). Third, some diseases do not have experiments of nature to guide target selection. Every complex disease is influenced by environmental, behavioral, or stochastic factors that might lead to specific therapeutic hypotheses. Consistent with this observation, there are many examples of approved therapies that do not have obvious evidence of causal human biology.

Fourth, quantitative models are needed to translate causal human biology into therapeutic hypotheses that can be tested via pharmacodynamic biomarkers or clinical outcomes in small PoC trials. For example, human genetics might suggest that modulating a target will have a desired effect in humans, but genetic data might not indicate how much to modulate the target for a desired therapeutic window. Fifth, new digital health technologies must enable clinical trial designs that test previously untestable therapeutic hypotheses. Although more accurate biological measurements are important, what will truly transform clinical trials is to introduce technologies that have hitherto been impossible to measure in humans.


Individually, each of these four areas—causal human biology, therapeutic modality, biomarkers of target modulation, and PoC clinical trials—has received ample attention. However, it is important to connect all four concepts to test therapeutic hypotheses in humans. This translational medicine approach will not eliminate all late-stage R&D failures—drug discovery is an inherently risky business, after all—but it should help. Indeed, the examples cited here demonstrate feasibility. Drug R&D portfolios that adhere to these principles with discipline will likely benefit from an increased probability of success in delivering novel therapies to patients in need.


Competing interests: R.M.P. is a full-time employee at Merck and Co. Inc.

Stay Connected to Science Translational Medicine

Navigate This Article