PerspectiveComputational Biology

Computational Genetic Mapping in Mice: The Ship Has Sailed

See allHide authors and affiliations

Science Translational Medicine  21 Oct 2009:
Vol. 1, Issue 3, pp. 3ps4
DOI: 10.1126/scitranslmed.3000377

Abstract

Computational haplotype-based genetic mapping can be used to discover new biological mechanisms, disease-related pathways, and unexpected uses for existing drugs. Here we discuss the benefits and limitations of this methodology, its impact on translational medicine, and its future course.

Multiple genetic factors often influence the risk for developing common diseases or an individual’s response to a particular drug, and the task of identifying such factors can be complex and labor-intensive. After computational (in silico) genetic mapping was initially used to analyze physiological or disease-related traits in inbred mouse strains (1), its use was subsequently expanded by organization of the murine genome into naturally occurring haplotype blocks (25). Haplotype blocks are regions on a chromosome in which the alleles are inherited as a unit, and the inbred mouse strains have only a few common haplotypes within a given region. We have used computational haplotype-based genetic mapping to navigate uncharted waters, discovering new biological mechanisms, disease-related pathways, and unexpected uses for existing drugs along the way. However, we also have encountered several substantial limitations associated with computational genetic mapping in mice. What are the benefits and limitations of this methodology, its impact on translational medicine, and its future course?

COMPUTATIONAL GENETIC MAPPING: METHODS AND UTILITY

Computational genetic mapping enables researchers to rapidly identify a causative genetic factor by correlating a pattern of observable physiological or pathological differences among selected inbred mouse strains with a pattern of genetic variation. To do this, a researcher must first measure a property of interest in 10 or more existing inbred mouse strains. Genetic factors are computationally predicted by identifying genomic regions where the pattern of genetic variation correlates with the distribution of trait values among the inbred strains studied using analysis of variance (24). The performance and precision of this computational analysis method require an accurate high-resolution representation of the pattern of genetic variation among the inbred strains. To produce this representation, we created a database consisting of a dense set of single-nucleotide polymorphisms (SNPs) across 19 inbred mouse strains and constructed a map of the haplotypic block structure of the mouse genome (3). There are many different ways to organize the pattern of genetic variation within a population. We chose to construct the haplotype map of the genome of the inbred strains by allowing the number of haplotypes in a region to vary, rather than forcing, into haplotype blocks, regions with low linkage disequilibrium (that is, regions in which SNP alleles occur in frequencies that are expected from a random association) (3). Although there is a large range in the size of the haplotype blocks, the average size (approximately 30 kb) is about the same as that of a single gene, facilitating precise computational genetic analyses.

A recently performed computational pharmacogenetic analysis illustrates this process (6). In this study, 15 inbred strains of mice were given a test drug (testosterone), and the rate of testosterone metabolism was analyzed in each strain. The data were then computationally analyzed to identify candidate genes whose pattern of genetic differences correlated with interstrain differences in the rate of drug metabolism (6). In contrast, a similar pharmacogenetic analysis was performed previously using conventional quantitative trait locus (QTL) analysis (7, 8). This method identifies regions (located between genotyping markers) that are linked with phenotypic traits in intercross progeny, based on recombination between parental chromosomes of the two different inbred strains. Recombinant intercross progeny of two parental strains that differed in the rate of metabolism of this drug were produced, the rate of drug metabolism was measured in the intercross progeny, and the genome of each of the recombinant progeny was scanned to identify genomic regions that were derived from each of the parental strains. The rate of drug metabolism was then correlated with the presence of genome segments derived from the parental strains. The QTL mapping study required several years to complete and identified a very large chromosomal region that may regulate the activity of the relevant enzyme. In contrast, haplotype-based computational mapping, which does not require the generation of intercross progeny and analyzes smaller genomic segments, enabled individual candidate genes to be identified in less than 1 week (6).

This example shows the fundamental differences between genetic mapping using QTL approaches and computational genetic mapping (see Table 1 for comparison).

Table 1.

Comparison of QTL and computational genetic mapping methods.

View this table:

Since its inception in 2004, there have been multiple successful demonstrations of the power of haplotype-based computational genetic analysis, including (i) identification of a functional genomic element that regulates the expression of H2-Ea, a gene encoding a histocompatibility antigen (2); (ii) identification of the genetic factors that affect the metabolism of the anticoagulant warfarin (6) and two other prescribed medications (9, 10); (iii) characterization of the genetic basis for the susceptibility of inbred mouse strains (and subsequently, human hematopoietic stem cell transplant recipients) to invasive aspergillosis, a serious lung infection (11); (iv) determination of the genetic basis of differences in responses to analgesic medications (12), inflammatory pain responses (13), and opioid addiction–related traits (14, 15); and (v) a recent determination that interstrain differences in susceptibility to opioid dependence correlate with genetic variation within a gene encoding the 5-hydroxytryptamine (serotonin) receptor 3A (5-HT3) (16). This last discovery was followed quickly by the demonstration that a commonly available 5-HT3 receptor antagonist used to treat nausea inhibited the development of morphine dependence and the manifestation of multiple withdrawal symptoms in mice and could also reduce experimentally induced opioid withdrawal symptoms in humans.

CONCERNS ABOUT COMPUTATIONAL MAPPING

Despite these successes, the response to computational genetic mapping has been mixed. Several publications have included criticisms suggesting that computational mapping is a ship that will not float, much less navigate uncharted waters. Authors have (i) “expressed serious concern about the validity of this method as a general approach for dissecting the genetic basis of complex traits” (17); (ii) indicated that computational mapping methods will fail because “the definition of haplotype blocks is not robust” (18); (iii) dismissed this method because it cannot analyze “the small effects that characterize behavioral genes (19); and (iv) dismissed this method because of a combination of these concerns (20 22).

We have demonstrated repeatedly that computational haplotype-based mapping can successfully identify the genetic basis for interstrain differences in biomedical traits, logically disproving the claims to the contrary. The concerns that it will generate “spurious associations” with noncausative genomic regions (that is, false positives) (17) and lacks sufficient power to detect genetic loci that make a small overall contribution to a genetic trait (17, 19) are valid, but reflect a misunderstanding of the way in which computational genetic mapping is applied. One can attempt to eliminate false positive predictions by setting a very high bar for the statistical significance of an identified genomic region, but this approach raises the risk of a false negative (spuriously rejecting a true causative genetic factor). Because we cannot compensate for a false negative, we have chosen to set our statistical significance at a threshold that minimizes the chance of obtaining false negative results. We then eliminate the resulting false positives using “orthogonal criteria.” Orthogonal criteria include other data elements, such as gene expression, proteomic, or metabolomic data, as well as curated biological information (for example, see www.geneontology.org) that can readily separate likely genetic candidates from false positives. Orthogonal criteria are a fundamental component of any type of biomedical discovery. Pharmacogenetic analyses (6, 9) have shown that applying independent criteria eliminates many false positive predictions (Fig. 1).

Fig. 1.

Filtering gene candidates. The use of orthogonal information and knowledge-based filtering can reduce the number of candidate genes. Haplotype-based computational genetic analysis identified 100 genes whose patterns of genetic variation correlated with measured interstrain differences in the rate of metabolism of a test drug. Complex methods for integrating orthogonal information can be applied to assess gene candidates. Because this drug is metabolized in the liver, 60 genes not expressed in the liver were eliminated and the three genes with oxidoreductase activity were selected for further analysis. Thus, the use of readily available gene expression data and knowledge-based filtering narrowed a list of 100 candidate genes to 3 likely targets for further evaluation.

CREDIT: C. Bickel/Science

Computational haplotype-based genetic mapping has limited power to identify genetic loci that make a small contribution to the trait being measured (that is, have a small genetic-effect size) (17). We have shown in simulated data sets why the computational genetic methods fail to identify genetic factors for certain mouse genetic models of biomedical traits (5). A single genetic locus must have an effect size of >0.4 for a computational mapping study (using data obtained from ≤15 inbred mouse strains) to have an 80% chance of detecting a causal locus (5). From these simulations, we predict that if data obtained from 40 inbred strains were computationally analyzed, genetic loci with an effect as low as 0.15 could be identified (5).

However, many complex physiological processes can be subdivided into discrete components, each having a less complex genetic architecture than the parent process, which enables the genetic basis for inter-individual differences to be identified computationally. For example, recent pharmacogenetic analyses (6, 9) examined the time course of a parent drug and its metabolites (Fig. 2). Computational genetic methods could not analyze the rate of loss of the parent drug from plasma, because the effect size for each genetic component was too small. However, analysis of the rate of formation of individual drug metabolites enabled pharmacogenetic factors to be easily identified by computational genetic mapping (6, 9). By reducing the biological complexity of the problem, we were able to overcome a major limitation of computational genetic analysis and thus harness its power.

Fig. 2.

Pharmacogenetic analysis. (A) The plasma concentrations of a test drug were measured over time after administration to multiple inbred mouse strains. Each colored line represents the data for a different strain. Drugs are metabolized through multiple pathways, and distinct enzymes catalyze individual steps within each pathway to generate the many metabolites that are derived from the test drug. Although the maximal drug concentration varied, the clearance of the test drug was similar in all inbred strains. (B) The rate of formation of individual drug metabolites was analyzed with reversed-phase high-performance liquid chromatography radiometric methods. Each chromatographic peak represents one of the nine different drug metabolites in plasma. (C) Although they had a similar rate of parental drug clearance from plasma, there were substantial interstrain differences in the rates of production of several different drug metabolites. The rate of production of one metabolite (M8) is plotted for each strain in the bottom panel. (D) Computational genetic analysis was unable to analyze the overall rate of drug clearance (not shown) but could identify the genetic factor (Cyp2c29) affecting the rate of M8 production among the inbred strains. This example was derived from data presented in (6). CPM, counts per minute; Chr., chromosome; SNPs, number of single-nucleotide polymorphisms; Liver exp., liver expression (the drug is metabolized in liver).

CREDIT: C. Bickel/Science

Nevertheless, we have encountered rough seas when computational genetic mapping was used to analyze mouse models for anxiety, lung fibrosis, the mechanism of anesthetic action, and susceptibility to influenza. In these situations, computational genetic analysis failed to identify the underlying biological target. Clearly, this approach cannot be used successfully in all situations.

COMPUTATIONAL MAPPING VERSUS RECOMBINANT INBRED STRAINS

To increase the resolution of QTL mapping, large arrays of recombinant inbred mouse strains (~1000) are being produced (19, 23), which could serve as a powerful genetic mapping resource. Despite the large number of recombinant mice, this panel only contains the genetic variation present within the six to eight founder strains. It is possible that their altered genetic makeup could enable some traits that are not found in the parental strains to be manifest within the recombinant inbred strains. However, use of this panel might provide a limited ability to analyze many disease traits whose underlying causative genes are not variable within the limited set of founder strains. For example, SJL mice are resistant to acetaminophen-induced liver toxicity (24). Because this strain was not among the founder strains, this recombinant inbred strain panel may offer little insight into this important trait. The decision by a U.S. Food and Drug Administration advisory committee in July of 2009 to decrease the recommended dose of acetaminophen and to eliminate many prescription products that combine acetaminophen with other medications indicates the extent of the public health problem caused by acetaminophen-induced liver toxicity.

LOOKING TOWARD 21ST-CENTURY BIOSCIENCE

Two trends in contemporary bioscience research will increase the utility of haplotype-based computational genetic mapping. First, this method is very well suited for analyzing the measurements produced using contemporary “omics” methodologies: transcriptomics, proteomics, and metabolomics. In addition to the information about gene expression that can be obtained by microarray-based analysis, the concentrations of thousands of metabolites or proteins within a tissue can now be measured simultaneously. For example, computational genetic analysis of gene expression data led to the identification of a cis-acting enhancer element that modulated H2-Eα messenger RNA expression in the lung (2). Similarly, the recent identification of a diet-dependent genetic factor that affects susceptibility to acetaminophen-induced liver toxicity in mice was enabled by an integrative analysis of genetic, transcriptional, and metabolomic data (25). Emerging databases of tissue-specific proteomic and metabolomic information provide orthogonal information to help an investigator eliminate false positive results obtained with haplotype-based computational genetic mapping. Second, this method can be used for “endophenotype” analysis. Many common diseases are collections of signs and symptoms. For example, rheumatoid arthritis, systemic lupus, and major depression are diagnosed by the presence of 4 of 7, 4 of 11, and 5 of 9 different clinical criteria, respectively. These definitions unite heterogeneous patient groups under a single disease classification. An endophenotype is a physiological, biochemical, anatomical, or cognitive measurement that represents some aspect of a disease (or physiological) condition [reviewed in (26)]. Computational analysis of endophenotypes, rather than complex diseases, has the potential to assist in the discovery of disease mechanisms. For example, we have used computational analysis to analyze drug-addiction endophenotypes (16) and a complex pharmacodynamic response (12). The analysis of phenotypes with a less complex underlying genetic architecture is one method that can be used to overcome the inability of computational genetic mapping to analyze complex genetic traits.

CONCLUSIONS

Computational genetic analysis could have a substantial role in translational medicine studies in the near future. The process leading to our discovery of the 5-HT3 receptor’s role in opioid withdrawal (16) provides a basic outline for the translational process: (i) Human biomedical phenotypes of clinical interest are translated back into a mouse model that recapitulates the clinical phenotype. Analyses in mice eliminate the confounding variables that occur in human clinical cohorts (such as diet, other drugs, age differences, etc.). This also enables the phenotype to be simplified through analysis of a desired individual component of the disease. (ii) The phenotype is analyzed across multiple inbred strains. If significant interstrain differences in the response are measured, which indicates that genetic factors affect this response, the data can be analyzed with haplotype-based computational genetic analysis. (iii) Tissue can be obtained from the target organ of the inbred strains and used to analyze changes in metabolomic, transcriptional, or proteomic responses in the inbred strains. (iv) The resulting orthogonal data can be used to filter the candidate genes that were identified by computational genetic mapping. By this method, one or a few candidate genetic differences will be selected for further study. (v) The involvement of the human homologs of the selected gene candidates (and related pathways) then can be analyzed in human clinical cohorts through any of several different methods (such as a genetic analysis focusing on selected candidate genes in patient and control cohorts; or, as in the 5-HT3 receptor example, by assessing a therapeutic response).

In silico murine genetic mapping has set sail, proven seaworthy, and navigated previously uncharted waters with record efficiency. It can analyze many biomedical traits at far higher speeds and much lower costs than conventional mouse genetic analysis. Advances in sequencing technology will enable the production of haplotype maps of additional inbred mouse strains and other model organisms. The improved power that results from the additional genetic information will ensure that computational genetic mapping has an increasingly important role in analyzing fundamental biological mechanisms of health and disease.

Footnotes

  • Citation: M. Zheng, S. Shafer, G. Liao, H.-H. Liu, G. Peltz, Computational genetic mapping in mice: The ship has sailed. Sci. Transl. Med. 1, 3ps4 (2009).

References and Notes

View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article