The NCGC Pharmaceutical Collection: A Comprehensive Resource of Clinically Approved Drugs Enabling Repurposing and Chemical Genomics

See allHide authors and affiliations

Science Translational Medicine  27 Apr 2011:
Vol. 3, Issue 80, pp. 80ps16
DOI: 10.1126/scitranslmed.3001862


Small-molecule compounds approved for use as drugs may be “repurposed” for new indications and studied to determine the mechanisms of their beneficial and adverse effects. A comprehensive collection of all small-molecule drugs approved for human use would be invaluable for systematic repurposing across human diseases, particularly for rare and neglected diseases, for which the cost and time required for development of a new chemical entity are often prohibitive. Previous efforts to build such a comprehensive collection have been limited by the complexities, redundancies, and semantic inconsistencies of drug naming within and among regulatory agencies worldwide; a lack of clear conceptualization of what constitutes a drug; and a lack of access to physical samples. We report here the creation of a definitive, complete, and nonredundant list of all approved molecular entities as a freely available electronic resource and a physical collection of small molecules amenable to high-throughput screening.


The sequencing of the human genome and subsequent translational research efforts have brought about unprecedented opportunities for the rapid application of new biological knowledge to improve human health. Although the development of diagnostic applications of genomic information has been relatively straightforward, advances in therapy have been slower, in part a result of the time (10 to 15 years) and expense (~$1 billion) of new drug development (1).

New chemical entities (NCEs)—drugs that do not contain any previously approved active moieties—are the focus of most drug development efforts, partly because of the need for marketing exclusivity provided by patents to recoup the cost of drug research and development. However, the propensity of drugs to act on more than one target—or to act on their intended target in an unanticipated system—has long been noted to occur with regularity in clinical medicine, manifesting as either additional therapeutic uses for a drug or adverse events. With the recent difficulties of the biopharmaceutical industry in developing NCEs, and the focus on drug safety, more attention has been placed on drugs already approved for clinical use. Nowhere has this attention been greater than in rare and neglected diseases (RNDs), for which the expected limited return on investment makes NCE development particularly challenging.

Rare diseases are defined by the U.S. Orphan Drug Act as those with a prevalence of <200,000 in the United States; although rare diseases are frequently neglected in drug development because of their low prevalence, the term “neglected” diseases generally refers to tropical diseases that may be highly prevalent but occur in developing nations that are unable to afford treatments (2). There are over 6000 RNDs, of which fewer than 300 have any therapy currently available (3). As a result, particular interest has arisen in finding drugs for RNDs in the current pharmacopeia by finding new therapeutic indications for already approved drugs—a process frequently referred to as “repurposing.”

A drug must be demonstrated to be reasonably safe and effective in the treatment of a disease or condition in order to receive regulatory approval (4, 5). However, when used in larger populations many drugs are subsequently discovered to have clinical utility or toxicity not appreciated at the time of approval. This phenomenon can result in the expansion of a drug’s clinical use to new indications (such as in the case of pregabalin, an antiepileptic drug that was also found to be useful for treating neuropathic pain) (68) or the withdrawal of marketing authorization (which occurred for fenfluramine, originally used to suppress appetite but then found to lead to cardiotoxicity) (9, 10). Extension of the clinical use of a drug to a new indication has historically occurred via serendipitous clinical observation (for example, the observation that sildenafil, which was developed for treating hypertension, was also useful for treating erectile dysfunction) but more recently has occurred via a logical connection of a disease’s pathophysiology to a drug’s target. Examples of this scenario include (i) losartan, a drug developed to treat hypertension that might also be useful for treating Marfan syndrome because of its effects on transforming growth factor–β, which plays a role in this condition, and (ii) thalidomide for treating multiple myeloma (see below) (1113). In general, these effects can be the result of the interaction of the drug on its intended target in a different organ (which was the case for sildenafil) (14) or the action of the drug on a different target (so-called “off-target” effects) and can be beneficial (for example, imatinib—designed to inhibit the hyperactive BCR-ABL protein in chronic myelogenous leukemia—also acts on a harmful variant of c-kit that is associated with mastocytosis) (15) or harmful (for example, cisapride—which acts as a serotonin receptor agonist and was formerly used to treat gastrointestinal problems—also affects a cardiac potassium channel, causing cardiac side effects) (16).

The case of thalidomide is a particularly striking example of repurposing. Thalidomide (12, 13) was originally introduced in the late 1950s as a sedative drug and an effective antiemetic and was used widely to treat morning sickness. It was withdrawn later because of teratogenicity and neuropathy. Recently, there has been growing interests in thalidomide because it has been found effective against leprosy and multiple myeloma through inhibition of tumor necrosis factor–α and angiogenesis. The U.S. Food and Drug Administration (FDA) approved the use of thalidomide for the treatment of lesions associated with erythema nodosum leprosum (in which fat cells under the skin are inflamed) in 1998 and granted accelerated approval in 2006 for thalidomide in combination with dexamethasone for the treatment of newly diagnosed multiple myeloma patients. Studies are currently under way to determine the effect of thalidomide on arachnoiditis—inflammation of a membrane that protects central nervous system nerves—and several other types of cancers.

An alternative approach to repurposing, which does not require a priori knowledge of a disease or drug mechanism, is to screen drugs for activity in cell-based models of disease. For example, using such an approach the antibiotic ceftriaxone was determined to represent a possible treatment for amyotrophic lateral sclerosis (17), and the antihistamine astemizole was identified as a potential antimalarial drug (18). These anecdotal successes raise the possibility that a substantial percentage of RNDs might be treatable with drugs in the current pharmacopeia. Biopharmaceutical companies understandably have been less enthusiastic about testing their drugs for these indications because if such drugs are still covered by patents, any adverse events in RND patients could adversely affect revenue, and if the drugs are generic, the new RND indication would provide little financial return. Academic investigators or disease foundations may be interested in pursuing this approach but frequently lack the infrastructure and expertise to do so. Even when such approaches are successful, the data are generally not aggregated with those from other assays (preventing cross-assay comparisons) or made public (preventing the use of the data by others). Both activities would probably reveal important relationships among diseases and drug targets.

The National Institutes of Health (NIH) Chemical Genomics Center (NCGC) is a national resource for the translation of information found in the genome into biological insights and new therapeutics, particularly for RNDs (19, 20). The NCGC collaborates with disease and target experts worldwide to develop chemical probes for previously unstudied biological and therapeutic systems, using its assay development, quantitative high-throughput screening (HTS), informatics, and medicinal chemistry platforms (20). As part of its chemical genomics program aimed at understanding the features of small molecules that are important for biological activity, the NCGC has assembled a large and diverse collection of bioactive compounds. Although most such compounds have been shown to be active only in cell-free, cell-based, or animal model systems, a small percentage have been tested in, or approved for, use in humans and animals. These latter compounds, for which we reserve the term “drugs,” make up the class of bioactive small molecules that might be useful for repurposing applications. A comprehensive understanding of the activities of all drugs in the pharmacopeia would facilitate the greatest possible application of known drugs across the full spectrum of human diseases and help explain or predict their toxicities.

In order to enable this systematic drug mechanism and repurposing effort, we desired to identify and then acquire the complete, nonredundant collection of small-molecule drugs approved for human or veterinary use by regulatory agencies worldwide. However, upon initiating this effort it rapidly became evident that neither such a complete nonredundant list of drugs nor such a physical collection of them existed. Several attempts have been made to compile such a collection (2125), but upon scrutiny all turned out to be incomplete and/or to contain bioactive compounds not approved for use in humans. We therefore resolved to assemble a definitive collection. Because this task only needed to be done once and would be an enormous resource for the research and clinical communities, we resolved to make the information available through our Web site (26) and the collection available to collaborators who wish to bring their projects to the NCGC. This Perspective describes our creation of the NCGC Pharmaceutical Collection (NPC)—a definitive collection of drugs registered or approved for use in humans or animals—as both an Informatics and a Screening Resource and delineates our approach to profiling the activity of this collection across a broad array of human pathways and diseases. This ongoing project will benefit the medical, systems biology, and toxicology communities via the NPC Informatics Resource browser (Fig. 1), which we describe here (26), and via the NCGC’s collaborative screening programs. Data on the activities of these drugs generated through screening of the NPC Screening Resource will be made publicly available through PubChem (27), with links provided on the NPC Informatics Resource browser.

Fig. 1.

The NPC database browser. This browser (26) provides users with a graphical interface with which to explore drugs by a number of attributes, including but not limited to name, structure, approval status, indication, and target information.


Approval of a drug for human use is assumed to be an unambiguous event, so when we embarked on this comprehensive drug collection project we were surprised to find that estimates of the number of approved drugs in the literature vary widely (18, 22, 28). The FDA does not maintain a single list of drugs that they have approved; instead, there are several such listings, each with its own particular legal origin, intent, and limitations. The United States Pharmacopeia (USP)–National Formulary, which is published by the nongovernmental organization USP, is cited by the Federal Food, Drug, and Cosmetic Act of 1938 as the official compendium of the United States but is incomplete (29). The FDA publishes several official lists of its own, including (i) the Orange Book, (ii) the National Drug Code (NDC), (iii) the Drugs@FDA Web page, (iv) the Over-the-Counter (OTC) listings from the Office of Nonprescription Products, and (v) its Substance Registration System’s Unique Ingredient Identifier (UNII) (30). An August 2006 Department of Health and Human Services, Office of Inspector General audit of the NDC found that over 14,000 drug products were missing from official government registries and that over 34,000 listed products were no longer being marketed (31). FDA has acknowledged these informatics shortcomings and has recently introduced an initiative to correct this issue (32, 33), but that work has not yet been completed. Several others have also attempted to generate a precise listing of FDA-approved drugs (3439). Comparing these resources, we discovered that none of these lists fully concur with one another, probably because of varying definitions of the term “drug,” the complexity of the regulatory process, and the complexity of FDA’s own publications. Research, medical, marketing, and lay communities use the word “drug” quite differently, and it is only in precisely defining the term did the number and identity of all substances to be included in our comprehensive collection become clear.

For the purpose of pharmacologic, mechanistic, and repurposing studies, the term “drug” refers to a molecular entity (ME) (40) that interacts with one or more molecular targets and effects a change in biological state. These may be small molecules, proteins, antibodies, or other substances such as small interfering RNAs or aptamers. This term unambiguously denotes a unique molecule with a known structure.

The term “active pharmaceutical ingredient” (API) refers to the molecule in physical form and is more specific because different esters or salt forms of the same ME are designated as different APIs (for example, paroxetine mesylate and paroxetine hydrochloride). Although different APIs of the same ME may exhibit distinct absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties in vivo, this distinction is rarely relevant for in vitro and in silico studies, so the NPC Screening Resource contains only one API for each ME.

Next on the scale of lexical specificity (Fig. 2) is the term “Drug” as it is used by FDA, which is a name approved for marketing that defines an API or set of APIs used. Thus, a given API may be found in multiple drugs (for example, ibuprofen, Motrin, Advil, and Nuprin). This term includes drugs requiring a prescription and those that do not [“Over-the-Counter” (OTC)], and those covered by current market exclusivity (“brand”) and those that are not (“generic”), and may be small molecules, proteins, antibodies, or other substances or groups of substances. Also included in the category that FDA (and other regulatory agencies) terms Drugs are substances or extracts without a defined molecular structure and products that are used for supportive or diagnostic purposes but are not intended to be used as specific modifiers of a particular disease or condition. These substances include thousands of allergenic extracts, commonly used intravenous fluids such as lactated Ringer’s solution and D5W (5% dextrose in water), oxygen, and purified air.

Fig. 2.

What’s in a name? Definitions of the terms Drug Product, Drug, API, and ME and the numbers associated with each term. Record counts include veterinary drugs. Numbers of substances approved only for human use are shown in parentheses. The definition of an API comes from (74), and that of a chemical entity (CE) comes from (75). CGMP, current good manufacturing practices.


Lastly, the most specific term for such substances is “Drug Product.” Most Drugs are marketed in a wide variety of dosages, forms (for example, oral, intravenous, and intramuscular), combinations, and packages, and each of these is referred to as a “Drug Product” by FDA.

These definitions clarify why there has been such confusion about what a Drug is and how many there are and make possible the building of a comprehensive and non-redundant list and collection. What the lay public refers to as a Drug is more properly termed a Drug Product. In mechanistic research parlance, however, Drug refers to a ME, or an API when the physical substance is being considered.

The example of the common analgesic and antipyretic medicine Tylenol illustrates the importance of these distinctions to the proper classification and counting of drugs. N-(4-hydroxyphenyl)ethanamide, which is known as “acetaminophen” in the United States, is the API in Tylenol. However, this API is also marketed under many other brand names, each registered as a separate Drug by the FDA. These names include Aceta, Actimin, Anacin-3, Apacet, Aspirin Free Anacin, Atasol, Banesin, Crocin, Dapa, Dolo, Datril Extra-Strength, DayQuil, Depon & Depon Maximum, Feverall, Few Drops, Fibi, Fibi plus, Genapap, Genebs, Lekadol, Liquiprin, Lupocet, Neopap, Ny-Quil, Oraphen-PD, Panado, Panadol, Paralen, Phenaphen, Plicet, Redutemp, Snaplets-FR, Suppap, Tamen, Tapanol, Tempra, Valorin, and Xcel. Each brand name also comes in different forms—such as tablet, capsule, liquid suspension, suppository, intravenous, and intramuscular—and doses, and each of these is listed as a separate Drug Product. Outside of the United States, this API is known not as acetaminophen but as paracetamol, and each paracetamol-containing product is also listed separately in regulatory databases. Thus, worldwide there are hundreds of different Drugs and Drug Products that appear separately in drug databases, but all are composed of or contain the same ME/API, N-(4-hydroxyphenyl)ethanamide. This circumstance exists for many APIs, essentially none of which is marketed under only one name in only one dose, form, or combination.


Assembling a comprehensive enumeration. Using the definitions above, each category was enumerated using data first from the FDA and then from regulatory agencies outside the United States. In this step, inclusion and completeness was the goal, with redundancy to be eliminated later (see the following section). Lists of drug names approved for human use were obtained from the FDA official publications listed above. After assigning structures and Chemical Abstracts Service (CAS) numbers to these names wherever applicable and removing duplicate entries, we found that FDA has over 100,000 Drug Products registered. These Drug Products have in them over 10,000 Drugs. However, this latter number, although formally correct, is misleading because the majority of these 10,000 are different brands of the same API, different APIs of the same ME, or chemically undefined substances (for example, allergenic extracts) (table S1).

We considered the possibility that additional Drugs might exist that are not listed by the FDA if they were in use before the relevant statutes took effect in 1938 (41). We could find no evidence of such additional drugs, and FDA considers such drugs unlikely (42). FDA has made efforts to evaluate drugs in use before 1938 and has taken proactive steps to exclude them from the market if the science was found lacking, as was the case with ethyl nitrite (“sweet spirit of nitre”—a traditional remedy for colds and flu) (43, 44).

Two categories of compounds not currently approved for human use were then added to this enumeration because of their potential for human application: veterinary products listed in the Green Book (the FDA-approved animal drug list) and drugs previously approved for human use but subsequently withdrawn from the market. Ivermectin is a broad-spectrum antiparasitic that was first approved for veterinary use and subsequently repurposed for treatment of human helminthic diseases (which involve parasitic helminth worms), particularly onchocerciasis, a disease caused by infection with a nematode that leads to blindness (45). Thalidomide is a prominent example of a previously withdrawn drug being repurposed for another indication and reapproved (13). Drug withdrawal may occur either because FDA or another regulatory agency withdraws marketing approval (46, 47) or because the manufacturer voluntarily ceases production (for example, mesoridazine, which was originally used to treat schizophrenia and withdrawn because of serious side effects); in the latter case, the drug may remain listed in regulatory publications. Several resources that list and monitor drug withdrawals (4853) were included in the NPC. However, the designation “withdrawn” is not unambiguous because drugs are frequently withdrawn for one indication while remaining approved for others, or withdrawn in one country or market while remaining on the market in others. The veterinary drugs and drugs withdrawn for certain indications are labeled as such in the NPC browser (26).

Drugs are frequently approved in other countries but not approved by the FDA; we wished to capture these Drugs for the NPC as well. We therefore performed analogous definition of terms, enumeration of categories, and definition of structures for Drugs approved by regulatory agencies in other countries (Table 1). We obtained listings from the Dictionary of Medicines and Devices published by the UK National Health Service Information Authority (NHS), Health Canada’s (HC) Drug Products Database, the European Medicines Agency (EMA), and an English translation of the Japanese Pharmacopeia, Fourteenth Edition.

Table 1. Data sources for approved drugs.


View this table:

Whereas currently or previously approved drugs may be those most amenable to repurposing, compounds that have been registered for human testing but not necessarily approved by any regulatory agency also represent potentially attractive starting points for further testing and/or medicinal chemistry optimization so were also included in the NPC. This category includes unapproved Drugs registered by the U.S. Drug Enforcement Agency (DEA), compounds listed in the World Health Organization (WHO) International Nonproprietary Names (INN) and United States Adopted Names (USAN) registries, and compounds listed on the U.S. tariff schedule (Table 2). These listings include Drugs that have an approved Investigational New Drug (IND) application with the FDA or analogous approval by regulatory agencies outside the United States, and those that are being tested or have been tested in clinical trials in humans. Importantly, inclusion in any of these latter registries does not indicate that the drug has in fact been tested in humans, much less the stage of testing (for example, phase I, II, or III). The latter information is impossible to determine systematically because it is generally not disclosed by the company doing the trials;, for instance, requires prospective registration of the trial but does not standardize disclosure of the API (or APIs) being tested. Although not immediately useful in repurposing applications, these USAN/INN drugs may be considered partially developed drugs and therefore require less effort to achieve regulatory approval than do compounds in preclinical stages of drug development. Detailed information on the number of drug records obtained from each source can be found in table S1.

Table 2. Sources for bioactive compounds (including unapproved substances) tested in humans.


View this table:

Generating a nonredundant list of MEs. Having produced an aggregate enumeration that was complete, we then devised a process to eliminate redundancy to arrive at a list that was nonredundant—in which each ME was represented only once. There are multiple mechanisms by which a single ME may be listed more than once in drug listings, including by simple duplication within or between countries or by listing of MEs as distinct APIs, Drugs, or Drug Products. Because this redundancy is the source of much of the confusion in the literature about how many MEs exist, this process of redundancy elimination is described here in some detail.

Different regulatory agencies often assign different names for the same API (for example, paracetamol and acetaminophen) and do not adhere to standard ways of listing active ingredients (for example, terazosin, terazosin hydrochloride, and terazosin hydrochloride anhydrous are all used to refer to the same API); such idiosyncratic and inconsistent naming made synonym identification difficult. Several heuristics were created to eliminate redundancy reliably. Because the only completely unambiguous identifier was chemical structure, but the regulatory agencies did not supply structural information, APIs were matched to chemical structures via names, synonyms, and/or CAS numbers. Structures were primarily derived from ChemIDPlus’s PubChem deposited substances (54), Prous’s PubChem deposited substances (search for “Prous Science Drugs of the Future”[SourceName] under PubChem Substances) (27), the FDA Maximum (Recommended) Daily Dose database (FDAMDD) (55), commercial supplier catalog structures, SciFinder searches, or ChemOffice’s name-to-structure utility (56) if an International Union of Pure and Applied Chemistry (IUPAC) name was available, and were manually checked against existing literature, including drug labels (57). CAS identifiers are included in the Japanese Pharmacopeia, and were also obtained from ChemIDPlus via PubChem and manually from SciFinder. We relied on SciFinder to find the correct mapping from name to CAS and to structure once an inconsistency was identified. As a structure source, commercial supplier catalogs (tables S3 and S4) were found to be less reliable (that is, more error prone) than were ChemIDPlus and FDAMDD. An initial scan identified 1770 inconsistencies (different structures having the same name) from 12,800 vendor records, indicating an error rate of approximately 14%. On the basis of the incorrect structures identified during the manual curation process, ChemIDPlus, of which we curated 12,300 records linked to approved drugs, appeared to be a reliable source for structures with an estimated error rate of 1.3% (155 errors found), and FDAMDD, with 990 of its 1217 structures curated, was of similar quality, with an error rate of 2.0% (20 errors found).

Many compound structures included salt and solvent. Common salts were removed computationally, and the remaining mixtures separated into component MEs by using automated software followed by manual curation to verify the results. As structure representations of heavy metal–containing compounds are frequently problematic, a special set of heuristics were applied to these drugs so that any fragment without a carbon or nitrogen and with less than six atoms was removed, and the rest of the molecule treated as one ME. Structures were then canonicalized to facilitate ME matching by use of the IUPAC International Chemical Identifier or InChI hash key (58) and a NCGC software package for structure standardization (59, 60). To accomplish this, API records were first merged by canonical MEs, then by CAS number, and lastly by names and synonyms. Each unique ME was assigned a unique ID. If more than one unique ID shared the same CAS numbers or name, an alert flag was added indicating a potential error in the structure, name, or CAS. Manual curation was then performed to correct such mistakes (table S2). Because a unique ID was assigned to each unique ME, mixtures (that is, drugs made up of multiple MEs) had more than one unique ID assigned.

Comparison with previous estimates of drug numbers. The NPC is the most comprehensive and accurate exposition to date of MEs registered or approved for human or veterinary use worldwide. There have been many previous efforts at compiling drug lists, but upon examination it was evident that all have suffered from substantial overcounting, undercounting, and/or misclassification. Much of the confusion derives from different definitions of what a “drug” is, the redundancy of drug name listings, the often opaque nature of regulatory agency databases, and the lack of a connection of drug names to unambiguous chemical identifiers such as structures or simplified molecular input line entry specifications (or SMILES). In addition, the term “approved” has previously been used in different ways. A compound may be “approved” for listing in a database (such as the INN), “approved” for use in experimental settings only (as indicated by IND approval by FDA), “approved” by one or more regulatory agencies for specific clinical uses and marketing, or may have been previously “approved” but subsequently withdrawn from the market. Compounds in all of these categories have been included in previous listings of “approved” drugs, although only a fraction can actually be used in medical practice. In the NPC, “approved” means that marketing and use in medical practice, for the prevention or treatment of one or more disease indications, is currently allowed by one or more regulatory agencies worldwide.

These ambiguities, and the lack of methodological detail in many previous estimates of drug listings, make comparison with our results difficult. Previous estimates of the number of “FDA-approved” drugs for screening have ranged from 1382 (61) to 6534 (35) and of approved drugs worldwide up to 9990 (22). Scrutiny of these databases revealed undercounting of approved drugs, misdesignation of tested drugs as approved, and inclusion of the same ME more than once because of naming ambiguities. Lastly, some reports have incorrectly indicated that compound listing in the USAN or INN indicates either entry into phase II clinical trials or approval for regulatory use outside the United States, greatly inflating the reported number of “approved” drugs available for repurposing. USAN/INN listing in fact only indicates registration by a sponsor of intention to file for human use at some point, but not approval for any such use (62).

How many drugs are there? This rigorous process of definition, enumeration, and redundancy elimination allowed us to arrive at definitive numbers of MEs, APIs, Drugs, and Drug Products, approved for human and/or veterinary use, in the United States and/or other countries; these are summarized in Fig. 2 and listed in detail in table S1. Because for scientific purposes the term “drug” is most properly used to describe MEs, the most accurate answer to the question, “How many drugs are there?” is in reality “How many approved MEs are there?” The answer to this question is that 2356 MEs are approved for human use by the FDA, and 3936 MEs are approved for human use in major markets worldwide including the United States. These will be the richest source for repurposing applications because they are already approved for human use, and thus approval of new indications will be most straightforward.

When MEs approved for veterinary use are included, the numbers of MEs increase to 2508 that are approved by FDA and 4034 worldwide. Also, there are 4935 unique MEs included in the USAN, INN, U.S. Tariff schedule, WHO, DEA, Kyoto Encyclopedia of Genes and Genomes (KEGG) drugs, and FDAMDD database listings of compounds registered for experimental human use but not approved by any regulatory agency for marketing (table S1). Although not immediately useable for human applications, these MEs would provide a “jump start” to new drug approvals and so are of interest and importance as well. Taken together, the 4034 unique worldwide approved MEs and 4935 unique worldwide registered MEs sum to a total unique 8969 MEs, which represent the universe of compounds for repurposing, advanced drug development, and chemical genomics.


From these listings, we created the NPC Informatics Resource (26), which lists all drug MEs and APIs and whether or not they are suitable for laboratory-based screening and enables informatics-based rational repurposing and chemical genomics applications. All APIs corresponding to these MEs are also included. Currently, the NPC Informatics Resource contains 2508 FDA-approved MEs and 4034 worldwide-approved MEs. Including the 4935 unique MEs from the USAN and INN registries, the NPC Informatics Resource contains 8969 unique MEs. Information will be periodically updated as curation proceeds, new MEs are added as they are registered or approved, and errors are found. This process will benefit enormously from community feedback, and we urge users to employ the error report mechanism on the site (26). Database updates with new records and error fixes will be released periodically with distinct database version numbers.


Laboratory-based HTS places certain constraints on the types of molecules that can be tested; therefore, a subset of the NPC Informatics Resource, termed the NPC Screening Resource, was created for HTS applications. The NPC Screening Resource excludes large molecules (such as proteins and antibodies >1500 MW), as well as small molecules that are insoluble in dimethyl sulfoxide (DMSO), unstable at room temperature, have less than 16 atoms, or have no carbon or nitrogen atoms. Because different salt forms of the same ME behave similarly in in vitro assays, only one API corresponding to each ME was included. The APIs suitable and not suitable for HTS are labeled accordingly in the NPC Informatics Resource (26). The NPC Screening Resource listing includes 2750 worldwide-approved MEs (including 1817 FDA-approved MEs) and 4881 USAN/INN MEs, for a total of 7631 MEs (table S1) (26).

Acquisition of all approved drugs. Acquisition of physical samples of the 2750 worldwide-approved ME/APIs on the NPC Screening Resource list was surprisingly challenging, principally because chemical vendors generally list their inventory by structure, IUPAC name, or CAS number, and none of this information is routinely available from the regulatory agencies. Therefore, we used intermediary data sources to connect ME/APIs with vendor entries individually (Fig. 3). In addition, different vendors frequently represented a given ME with different structures, so software was written to detect discrepancies and resolve them automatically whenever possible. When resolving ambiguities, ChemIDPlus (63) was particularly accurate and useful.

Fig. 3. Workflow for the NPC library construction process.

This process is complicated by the high prevalence of errors in the data sources. Octylmethoxycinnamate is shown as a semantic web diagram as an example. In the diagram, each node represents one entry from a data source. Lines represent identity relationships between source nodes, matching either on (i) name, (ii) CAS, or (iii) chemical structure. The line in red represents a spurious synonym linkage (Escalol 506) between padimate A and octinoxate.


Compound acquisition was prioritized by approval status, ease of acquisition, and cost. Currently approved drugs were assigned a higher priority for acquisition than investigational drugs, and drugs registered in the United States were assigned a higher priority than drugs registered in other countries. Drugs were procured from commercial bioactive compound collections (for example, Sigma-Aldrich’s Library of Pharmacologically Active Compounds) and bulk chemical suppliers (for example, Sigma-Aldrich and ChemBridge) first, from which large numbers of compounds were available at relatively low cost, with structures provided for all compounds. Procurement aggregators such as ChemNavigator and specialty chemical vendors (table S4) that generally supply compounds at higher costs were used next. If no commercial supplier could be found for a drug, the drug product was obtained from pharmacies and the API purified. For compounds not available commercially, drugs were custom synthesized either by NCGC chemists or via outsourcing; the cost for custom synthesis depended on the structural complexity and number of synthetic steps and ranged from $1000 to $40,000 for 100 mg of compound.

The current acquisition status of the NPC Screening Resource is summarized in Table 3. The majority (64%) of the NPC Screening Resource–approved drugs, totaling 1767 compounds, were obtained from major suppliers, including Sigma-Aldrich (St. Louis, MO), Tocris Bioscience (Ellisville, MO), MicroSource Discovery Systems (Gaylordsville, CT), Enzo Life Sciences International (Formerly BIOMOL International, L.P., Plymouth Meeting, PA), Prestwick Chemical (Illkirch, France), USP, the National Institute on Drug Abuse (NIDA), and the National Cancer Institute (NCI) (table S3). Controlled substances were mainly procured from NIDA and Sigma, after licensing of the NCGC by DEA. These suppliers were willing to provide compounds in 96-well plate or 96-tube rack formats, making them the easiest to prepare for screening. Approximately 15% of the collection (404 compounds) were sourced as individual compounds from over 70 smaller chemical suppliers (table S4), either directly from the supplier or through procurement aggregators such as ChemNavigator (San Diego, CA). This was a time-intensive process, requiring the iterative manual compilation of a master list of compound names and structures. Continual changes in vendor offerings led us to create a custom structure comparison tool (MolOverlap v1.0) (64), which compares structure-data files from all vendors to a master list of structures and outputs a text file of matching structures to be obtained; this tool is freely available (64). This tool allows us to rapidly, accurately, and continuously extract updated catalog items and procure them for the collection. APIs that are not commercially available were sourced as drugs from pharmacies and purified. Approximately 21% of the collection (579 compounds) were not available from any vendor and therefore required custom synthesis, either by NCGC chemists or via contract synthesis. As of this publication, syntheses have been completed for 220 of these compounds; synthesis of the remaining 359 will be completed over the next 6 months.

Table 3. Procurement status of NPC.


View this table:

Acquisition of compounds registered but not approved for human use. Of the 4881 MEs identified as appropriate for inclusion in the NPC Screening Resource that are registered only by WHO INN and USAN, or compounds listed on the U.S. tariff schedule, only a small proportion are available commercially. Currently 928 (19%) of these compounds have been procured or synthesized. Approximately 20% of the remaining MEs are obtainable from chemical vendors, and the remaining 60%, totaling nearly 3000 compounds, will require synthesis at the NCGC or via contract synthesis (Table 3). Given the cost and time required for custom synthesis, we expect that the expansion of the NPC Screening Resource to include all registered MEs will be a long-term effort, but a critically important one. When considering starting points for chemical optimization for a new drug, these 3000 compounds may be considered advanced leads with probable attractive activity, physicochemical, and ADMET properties, which may therefore allow considerable time saving as compared with leads generated from conventional HTS of diversity collections.

Recently, we began procuring or synthesizing (i) drugs approved in countries or regions other than the United States, United Kingdom, Europe, Canada, and Japan and (ii) active metabolites of approved drugs. The building of the NPC Informatics and Screening resources will be ongoing, and as new compounds are added to regulatory databases, and physical samples are obtained, the Informatics and Screening Resource pages will be updated on our Web site. For immediate use by the community, we have listed all compounds in the NPC Screening Resource by regulatory agency and supplier in Fig. 4 and table S4, as well as at the NPC Web site (26).

Fig. 4. The composition of the NPC Screening Resource.

This Resource is shown broken down by (A) regulatory agency, (B) supplier type, and (C) sample cost. If a drug is listed by more than one regulatory agency, it is counted only once following the approval status priority rank: (i) FDA, (ii) United Kingdom/Canada/European Union/Japan, and (iii) investigational.


Quality control of the screening resource. Ensuring the correct identity and purity of compounds in screening collections is a critical aspect of drawing reliable conclusions from HTS data (65). This point is particularly critical for the NPC because the data generated from these compounds may be used to advance new clinical applications and draw conclusions about the universe of targets affected by clinically approved drugs and will be made publically available. Therefore, although we have received suppliers’ Certificates of Analysis each sample has also been subjected to independent quality control at the NCGC to ensure its correct identity and >90% purity through liquid chromatography-mass spectrometry (LC/MS). Three types of detectors are used for the analysis. The primary analytical technique for assessing compound identity is MS. Identification is based on the expected nominal mass being detected. The primary technique for assessing analytical purity of the LC eluate is an evaporative light scattering detection (ELSD); the secondary technique is ultraviolet (UV) absorbance at a wavelength of 220 nm. UV detection becomes important for samples that give a poor response to ELSD (65).


Thus far, compounds in the NPC Screening Resource have been screened against more than 200 assays of targets, pathways, and cellular phenotypes (Fig. 5). All screening is done using the NCGC’s quantitative HTS paradigm (66), in which every drug is screened at six or more concentrations over four to five orders of magnitude in the primary screen (66). The percentage of NPC compounds with activity in the assays used so far averages 4.2% [hits classified according to (66)], above the average rate of 1.8% in assays across the NCGC’s larger screening collection (principally the Molecular Libraries Small Molecule Repository), which is consistent with the notion that “bioactive” chemical structures frequently have multiple activities that might not be predictable a priori. Importantly, the assays against which the NPC has been screened have a wide diversity of formats and readouts, eliminating the possibility that this high hit rate is the result of an assay platform artifact (67, 68). Although beyond the purview of this Perspective, the individual and aggregate assay screening data generated by using the NPC will be of great interest for repurposing and chemical genomics and will be published [for example, (69, 70)] and made publicly available via the NPC Browser (26) and PubChem (27).

Fig. 5. Activities of the NPC screened against approximately 200 assays of targets, pathways, and cellular phenotypes.

The heat map, in which each row represents a drug and each column an assay, is colored by the drug activity observed (type of concentration response curve produced) so that activation is colored red, inhibition is blue, inactive is white, and missing data is gray. A darker shade of red or blue indicates more conclusive activity (significant concentration response). These data will be made publicly available through PubChem (27), with links provided on the NPC Informatics Resource browser (26).

The NPC is currently being used for three principal purposes: (i) repurposing drugs for the treatment of RNDs; (ii) defining the activities of known drugs for improved toxicological understanding, modeling, and prediction; and (iii) defining the characteristics of small-molecule compounds that confer biological activity.

Repurposing. Although many examples exist of successful repurposing, only recently has the concept of large-scale, even comprehensive, examination of the disease applications of clinically used drugs been considered (22, 23). In order to be maximally reliable and useful, the collection being screened must be comprehensive, the screening paradigm must minimize false positives and false negatives, confirmatory testing should be done, and the data should be made publically available. The substantial infrastructure and diverse disease expertise required for such an effort has until now prevented the implementation of comprehensive repurposing. The NPC, in the context of the collaborative mission of the NCGC and the NIH Therapeutics for RNDs (TRND) program (71) makes this comprehensive approach to drug repurposing feasible, and the enormous unmet medical need—over 6000 RNDs currently have no treatment—makes it urgent. Such repurposing will not only provide the possibility of rapid therapeutic advances but also will obviate the need for new ME development, which is a long and expensive process. Ultimately, application of the NPC to a large number of diseases will help determine the proportion of human diseases that are amelioratable by a drug in the current pharmacopeia; this question has both theoretical and practical importance, informing questions of common disease mechanisms and helping determine the scope of the problem of therapeutic development of the thousands of diseases currently without treatment.

Virtual screening of the NPC Informatics Resource can be performed by any investigator worldwide with an internet connection, and we encourage researchers to do so. To enable the research community and build the knowledge base of drug activities, we encourage researchers to inform us of their successes (and failures) using the Resource and to contribute their results to PubChem. Laboratory-based screening of the NPC Screening Resource is done at the NCGC via collaboration with any researcher who has a disease-relevant assay. The screening requirements for the NPC are much less demanding than a typical HTS campaign given the small number of compounds (3500 as compared to >350,000 for a typical HTS); in our experience, the assays that produce results most directly applicable to clinical applications use primary patient cells. We encourage any researcher to contact us with their interest. Given the expense of building and maintaining the NPC Screening Resource, we cannot send copies of the collection to collaborators (100 screens can be performed at the NCGC with the amount of compound required to send to one collaborator), but we routinely have collaborators bring their assays to the NCGC, or send them to us, for collaborative screening. A solicitation for development and screening of RND assays for repurposing applications has been released by the TRND program recently (71). For those researchers who prefer to reproduce all or part of the NPC Screening Resource in their own laboratories, information on suppliers of all compounds can be found in table S4 and on our Web site (26).

Toxicology. Although unanticipated biological activities of known drugs might be therapeutically beneficial for repurposing, those activities may also be responsible for unanticipated toxicological effects. Drug toxicity is a major reason that new drug development programs fail (72), and approved drugs are regularly removed from the market because of adverse effects; frequently, the mechanism by which the toxicity occurs is not known. To improve the reliability and mechanistic understanding of chemical toxicity, the NPC will be screened across a very broad range of pathways and cellular phenotypes relevant to toxicity as part of the Tox21 program, a collaboration between the NCGC, the National Toxicology Program, the U.S. Environmental Protection Agency, and FDA (73).

Chemical genomics. Ultimately, improvements in the efficiency of drug development and application to disease will rely on improved understanding, and therefore predictability, of the general principles by which small molecules interact with their biological targets. This long-term goal will be greatly advanced by the broad and rigorous profiling to which the NPC will be subjected. All data generated by the NPC will be placed into the NCGC’s publically available relational browser (NPC browser v2.0.0) (26), which will allow relationships between targets, pathways, diseases, and drugs to be queried in a user-defined fashion. Improvements will be made regularly in data richness and analysis capabilities.


The creation of the NPC as a definitive informatics and screening resource is an important milestone but is only the first step in its use for repurposing and chemical genomics. We hope that the enumeration of all drugs registered and/or approved for human and veterinary use, and the creation of laboratory resource for their screening, will allow efforts to turn to the more important questions of the Resources’ scientific and medical applications. The NPC will only achieve its potential as a community resource because the scientific and medical problems to which it can be applied will require the full breadth of target, pathway, and disease expertise. The NPC is intended as a collaborative instrument, and we encourage researchers to use the NPC interactively via our Web site and screening projects. All NCGC programs are partnerships, and the Center currently has over 200 collaborations with investigators worldwide.

Having provided what to our knowledge is a definitive listing of drugs intended or approved for human use, we hope that the chemical genomics and drug development communities will use the NPC to realize the full potential of these drugs for human health, addressing the many devastating and untreatable diseases for which therapeutics are so urgently needed.

Supporting Online Material

Table S1. Drug counts by regulatory agency/authority.

Table S2. Example mistakes found in data sources.

Table S3. Suppliers of known bioactive collections.

Table S4. List of compound suppliers.


  • Citation: R. Huang, N. Southall, Y. Wang, A. Yasgar, P. Shinn, A. Jadhav, D.-T. Nguyen, C. P. Austin, The NCGC Pharmaceutical Collection: A Comprehensive Resource of Clinically Approved Drugs Enabling Repurposing and Chemical Genomics. Sci. Transl. Med. 3, 80ps16 (2011).

References and Notes

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
  62. 62.
  63. 63.
  64. 64.
  65. 65.
  66. 66.
  67. 67.
  68. 68.
  69. 69.
  70. 70.
  71. 71.
  72. 72.
  73. 73.
  74. 74.
  75. 75.
  76. Acknowledgments: We thank in particular P. Loebach at FDA, Colonel C. Ohrt at Walter Reed Army Medical Center, H. Singh at NIDA, J. Heemskerk at NINDS, D. Sullivan at Johns Hopkins University, S. White at NCI, D. Livingston at Galapagos, G. Potti and R. DeChristoforo at the NIH Clinical Center pharmacy for help with source lists of approved drugs and discussions on procurement, M. Philippi and M. McClelland for help with procurement, W. Leister for quality control, C. Thomas, D. Maloney, and W. Huang for compound synthesis consultation, and D. Leja for illustration. Funding: This work was supported by the Intramural Program of the National Human Genome Research Institute, NIH. Author contributions: R.H. and N.S. coordinated the project, sourced and compiled drug lists to construct the NPC, helped to build the NPC database and browser, helped with the NPC procurement, and wrote the manuscript; Y.W. built the NPC database and browser; A.Y. and P.S. helped find drug sources, procured compounds for the NPC, and helped write the manuscript; A.J. conceived the design and user interface of the NPC browser; D.-T.N. helped build the NPC database and browser; and C.P.A. conceived and directed the project and wrote the manuscript. Competing interests: The authors declare that they have no competing interests.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article