ReviewPersonalized Medicine

NEW: Network-Enabled Wisdom in Biology, Medicine, and Health Care

See allHide authors and affiliations

Science Translational Medicine  04 Jan 2012:
Vol. 4, Issue 115, pp. 115rv1
DOI: 10.1126/scitranslmed.3002132


  • Fig. 1

    Network news. Biological networks are sensors and mediators of the combined effects of environmental and genetic CCD risk factors. Left: An example of an arterial wall co-expression gene network inferred from genome-wide mRNA profiles isolated from three arterial wall samples from the same patient (three RNA samples per individual). The criterion for edges (that is, interactions between nodes) in this network is supported by the data from at least two arterial wall samples. As in this example, biological networks are sparse, following a power-law distribution in which most nodes (that is, network components, which can be genes, proteins, or metabolites; shown in red) have few interactions; the few nodes that have many interactions are called “hubs” (yellow nodes in this example; >44 edges per node) (2). Networks can be inferred from various kinds of genome-wide data sets with the use of computational inference algorithms. One type is the exemplified gene co-expression network. The direction of edges is not revealed in co-expression networks, but the length of the edge is related to the strength of the association between nodes; the longer the edge, the weaker the association. Bayesian network reconstruction is more sophisticated, applying algorithms based on probabilities and conditional dependencies, disclosing networks with edges also holding information about the directions and type of regulation. In this way, Bayesian network reconstruction on the combined data sets of genome-wide DNA sequence variations and gene expression conducted with biological samples from a single individual can be used to distinguish networks that cause disease from those that constitute a reaction to disease (15). The reactive or causal role of a disease-related genome-wide gene co-expression network can be investigated by analyzing GWAS data sets to determine enrichment for inherited risk. In the network diagram shown: nodes (genes), the size and number indicate the numbers of neighboring nodes; edges, length is proportional to the strength of the Pearson correlation coefficients between nodes. Visualized using Cytoscape ( Right: Principal steps for genetic enrichment analysis using GWAS data sets. A network defines a list of functionally associated genes; alternatively, this list can be defined by co-expression clusters of genes (10). Next, corresponding DNA variants that affect expression of listed genes (eSNPs) are defined by seeking SNP allele frequencies that correlate with mRNA concentrations. The list of eSNPs is then matched to the GWAS SNP microarray platform using the HapMap ( or the 1000-genome ( platforms. The expanded set of SNPs is then examined for enrichment in disease risk either by searching for the relative number of disease associations [false discovery rate (FDR) = 0.05] or by examining whether the expanded set is shifted toward higher significance [in the figure to the right (increasingly red)] relative to sets of the same number of randomly selected SNPs (x10,000).

  • Fig. 2

    Tomorrow’s human biology. Molecular networks play a fundamental role in the future of biomedicine. (A) Basic concept of NEW in biology. Advances in mapping DNA loci related to human diseases and genome-wide profiling of mRNA transcript abundances have occurred on an unprecedented scale (11, 22, 3234). Of particular interest is the identification of RNA molecules that mediate the flow of information from DNA to disease, because RNA is transcribed directly from a DNA template and is thus the most proximal non-DNA species of all molecules in the cell. Bayesian network reconstruction (11, 12) is a powerful approach for simultaneously considering thousands of molecular or clinical variables and for identifying patterns of causal relationships between these variables in a completely data-driven fashion. We developed a way to overcome the chief limitation of this approach—deriving predictive models from correlation data (11, 12, 35)—by leveraging DNA variation as a systematic source of perturbation (32). The resulting probabilistic causal networks are critical for understanding the behavior of any one gene in the context of human disease, because individual genes operate in molecular networks that define disease-associated biological and pathological events. (B) A risk triangle for any given CCD that represents inherited (that is, genetic) risk factors, from the common ones (which are context-independent) to the increasingly individual ones (which are context-dependent). The genetics of gene expression cohorts (GGE) will help us interpret data from GWAS by identifying genes whose corresponding RNA levels associate with genetic loci that also associate with disease (22, 23, 32, 3638); gene expression profiles can also be used to infer causal relationships between molecular traits (up to networks) and disease states (30, 3941). The triangle represents the inherited risk for a given CCD in a given individual. CCD risk is shared with the wider population at the base of the triangle, becomes increasingly individual (that is, less represented in the wider population), and is purely individual at its peak. Left: Arrows indicate the roles and types of environmental factors that act in concert with inherited risk in the triangle. Middle: Ways of representing risk at different levels. Right: Types of genomic data needed to map CCD risk.

  • Fig. 3

    Context-dependent inherited risk and the importance of intermediate phenotypes in clinical research. (A) The diagram plots sets of genetic risk variants according to their dependencies on contexts in the macroenvironment (such as life-style factors and exposures to environmental toxins) that, over time, transform microenvironments (in cell and tissue types) to activate DNA variants that then exert their risk-promoting effects on certain CCDs. Some genetic risk variants are environment-independent, and their disease associations may be detected early in life (light purple shading, left and bottom parts of the graph); the effects of other genetic risk variants are age-related (such as epigenetic changes), because exposures to macroenvironments alter microenvironments over time (dark purple background); environment-dependent DNA risk variants become increasingly important for CCD development later in life. Color-coded key for CCDs that are affected by DNA risk variants is shown below the graph at the right. SLE, systemic lupus erythematosus; RA, rheumatoid arthritis; IBD, inflammatory bowel diseases. (B) The importance of intermediate phenotypes in clinical studies of CCDs. Left: Traditional GWAS are based on the idea that genetic variations follow Mendelian inheritance and are relatively infrequent and context-independent, even for complex biological events and diseases. Most CCD-linked variants will not be discovered in this way, because the genetic perturbation (top circle, gray center) is too weak to be sensed by the disease phenotype (top circle, blue outer area). Some DNA risk variants that are context-independent (those that are common in the population) can be revealed with a GWAS design alone [blue bottom circle, large red diamonds shown “above the surface” (horizontal line)]; however, such studies do not explain the full variation in disease phenotypes (blue bottom circle, smaller red diamonds below the surface). Middle: In genetics of gene expression (GGE) studies (top circle), the apprehending of an intermediate phenotype of mRNA abundance (a measure of, for example, gene expression) (top circle, intermediate purple area) from patients and control individuals allows additional DNA variants to be identified from GWAS data sets (top circle, gray center)—in particular those that are context-dependent—thereby explaining more of the variations in disease phenotypes. This is achieved because intermediate gene expression data provide a more proximal sensor of DNA variation than does the clinical phenotype alone (top circle, outer blue area; compare with GWAS design alone). A GGE design thus allows for the inference of disease networks [in bottom circle, nodes (purple)] that harbor several DNA risk variants (shown as red diamonds in the network) linked to disease where these networks act to drive disease phenotypes (blue background). Right: The top circle depicts genetics of gene (gray center), protein (intermediate purple area), and metabolite (intermediate dark turquoise area) expression (GGPME) studies, which provide an even richer collection of proximal sensors of DNA variation that inform the clinical phenotype (top circle, outer blue area). Bottom circle: The identification of several layers of genome-wide measurements that sense the flow of DNA information allows inference of complex full disease networks (in bottom circle) with all disease-linked DNA risk variants (red diamonds) in contrast to networks whose effects are reflected in changes in mRNA concentrations alone. Dark purple, light purple, and dark turquoise network nodes (circles) are derived from genome-wide RNA, protein, and metabolite measurements, respectively; the associated phenotypes are depicted by the blue background.

  • Fig. 4

    Tissue-specific and cross-tissue molecular networks. Tissue-specific networks: (A) red nodes, carotid or coronary lesions, atherosclerotic arterial wall, control arterial wall; (B) yellow nodes, subcutaneous or omental visceral fat; (C) pink nodes, skeletal muscle; (D) brown nodes, liver; (E) blue nodes, blood cells (for example, leukocytes, such as monocytes/macrophages). Orange nodes are part of networks shared across tissues [in this example, cross-tissue communication is shown for arterial wall (red) and fat (yellow) samples, but such crosstalk occurs among many tissues]. Sampling of patient tissues and, if possible, control individuals is central to a systems approach to CCDs. Here, the focus is on cardiovascular and metabolic diseases: Samples from several tissue and organ locations are necessary to study control individuals and cohorts of patients with cardiovascular and related metabolic diseases (such as obesity, diabetes, and dyslipidemia). The tissue samples are then used to isolate DNA, RNA, proteins, and possibly metabolites for genome-wide data generation; strict protocols for tissue isolation and immediate processing are crucial for data quality and meaningful downstream analyses as is careful clinical characterization of the CCD phenotypes. Molecular intermediate phenotypes from several disease-relevant tissues are then used for disease-network inference. Different organs and tissues have distinct mRNA footprints, and genome-wide RNA abundance measurements in each tissue allow detection of DNA variants that affect gene expression (that is, general and tissue-specific eSNPs). Tissue-specific causal networks can be inferred from computer-supported integrative analysis of omics data sets, including DNA-variation data. The disease impact of inferred networks is also determined by examining the network’s relative enrichment with inherited risk for CCDs using existing GWAS cohorts (Fig. 1). Some genes specialize in cross-tissue communication (27), so that some parts of networks (in this example, shown as orange nodes) are shared among different tissues and likely are responsible for related molecular activities. Genes in cross-tissue networks do not appear to belong to tissue-specific networks (and visa versa). These networks are believed to be particularly important for cardiovascular disease, cancers, and metabolic diseases, which involve many organs, particularly in late disease stages. In the future clinic, where NEW strategies rule, disease networks (tissue-specific and cross-tissue) will be used for early disease detection and for monitoring effects of preventive therapies.