Research ArticleCancer

Genome-Wide Mutational Signatures of Aristolochic Acid and Its Application as a Screening Tool

See allHide authors and affiliations

Science Translational Medicine  07 Aug 2013:
Vol. 5, Issue 197, pp. 197ra101
DOI: 10.1126/scitranslmed.3006086


Aristolochic acid (AA), a natural product of Aristolochia plants found in herbal remedies and health supplements, is a group 1 carcinogen that can cause nephrotoxicity and upper urinary tract urothelial cell carcinoma (UTUC). Whole-genome and exome analysis of nine AA-associated UTUCs revealed a strikingly high somatic mutation rate (150 mutations/Mb), exceeding smoking-associated lung cancer (8 mutations/Mb) and ultraviolet radiation–associated melanoma (111 mutations/Mb). The AA-UTUC mutational signature was characterized by A:T to T:A transversions at the sequence motif A[C|T]AGG, located primarily on nontranscribed strands. AA-induced mutations were also significantly enriched at splice sites, suggesting a role for splice-site mutations in UTUC pathogenesis. RNA sequencing of AA-UTUC confirmed a general up-regulation of nonsense-mediated decay machinery components and aberrant splicing events associated with splice-site mutations. We observed a high frequency of somatic mutations in chromatin modifiers, particularly KDM6A, in AA-UTUC, demonstrated the sufficiency of AA to induce renal dysplasia in mice, and reproduced the AA mutational signature in experimentally treated human renal tubular cells. Finally, exploring other malignancies that were not known to be associated with AA, we screened 93 hepatocellular carcinoma genomes/exomes and identified AA-like mutational signatures in 11. Our study highlights an unusual genome-wide AA mutational signature and the potential use of mutation signatures as “molecular fingerprints” for interrogating high-throughput cancer genome data to infer previous carcinogen exposures.


Unraveling the specific mutational effects of environmental carcinogens on the human genome is a cornerstone of cancer prevention research. Cancer genome analysis has revealed that different carcinogens, including pathogens (1, 2), cigarette smoke (3), and ultraviolet (UV) radiation (4), are often associated with distinct mutational patterns (“mutational signatures”), acting as telltale genetic fingerprints of previous exposures. Analysis of the spectra and patterns of somatic mutations in cancer genomes can also lead to a greater understanding of the mutational events triggering clonal outgrowth.

Aristolochic acid (AA) is a natural compound found in many plants of the Aristolochia genus. Aristolochia plants are commonly used in traditional herbal preparations as health supplements and remedies for various health problems including weight loss, menstrual symptoms, and rheumatism (5, 6). In the 1990s, epidemiological studies showed that AA exposure was associated with a high risk of nephrotoxicity and upper urinary tract urothelial cell carcinoma (UTUC) (710) caused by the ability of AA to bind DNA-forming DNA adducts (11). These findings consequently led to bans on the use of Aristolochia-containing herbal preparations in Europe and North America since 2001 and in Asia since 2003 (9). Currently, AA is classified as a group 1 human carcinogen in the IARC Monograph (12). In Asia, AA-associated DNA adducts can be detected in the renal cortex of more than 50% of UTUC patients in Taiwan (13), and the incidence of UTUC in Taiwan (30% of urothelial cancer) is strikingly higher than that in the West (3% of urothelial cancer) (14), consistent with AA playing a major carcinogenic role in Asian UTUC. However, the detailed mechanisms by which AA-induced mutations contribute to UTUC tumorigenesis remain largely unknown, and the involvement of AA in other cancer types has also not been extensively studied. Indeed, whereas mutagens such as AA clearly act on the entire genome, to date only mutations and mutational signatures in TP53 alone have been investigated in AA-UTUC (13, 1517). Thus, a genome-wide approach is needed to understand the full implications of AA-induced mutagenesis and how it contributes to UTUC.

Here, by whole-genome and exome sequencing, we studied a panel of UTUC patients who were exposed to AA. We identified (i) a strikingly high prevalence of somatic mutations, (ii) a specific AA mutational signature, and (iii) recurrent mutations in UTUC, notably involving chromatin-modifying genes. We recapitulated AA toxicity in vivo and the AA mutational signature in in vitro AA-treated cells. Finally, we used the AA mutational signature to screen for and detect AA-like mutational patterns in hepatocellular carcinoma (HCC), which suggests an underappreciated carcinogenic role for AA (or an AA-like compound) in a subset of liver cancers.


AA is a highly mutagenic group 1 carcinogen

To investigate the molecular effects of AA on a genomic scale, we performed whole-genome sequencing (WGS) of an AA-UTUC (tumor 9T) and nonmalignant kidney tissue from the same patient (table S1). Clinical exposure to AA was inferred from patient medical records and case histories, including female gender, history of herbal remedy use, and compromised renal function (18, 19). Female gender is relevant because it appears that, in Taiwan, AA-containing preparations are mostly used by women for weight loss (16). The UTUC and matched normal genomes were sequenced to 33X mean haploid genome coverage (table S2). Genome Analyzer Toolkit (GATK) software was used to identify single-nucleotide variants as previously described (1, 2). To identify somatic mutations, we excluded from analysis germ-line variants catalogued in either the dbSNP135 or 1000 Genome Project database and then subtracted the sequence variants of the normal genomes from the matched tumor genomes. We identified 438,872 somatically acquired single-nucleotide substitutions across the AA-UTUC genome, including 201,192 substitutions within genes and introns and 237,680 intergenic variants (table S3). The accuracy of the WGS data was assessed by Sanger sequencing of 250 randomly selected somatic variants (50 coding, 100 intragenic, 100 intergenic). Of these, ~98% were confirmed as genuine somatic mutations (tables S4 to S6).

The WGS analysis revealed an average mutation rate of 150 mutations/Mb of DNA in AA-UTUC, which is higher than previously reported mutation rates for smoking-associated lung cancer (8 mutations/Mb) (3) or UV radiation–associated melanoma (111 mutations/Mb) (4). Thus, among documented group 1 carcinogens, AA may exert one of the strongest known mutational pressures on the human genome (Fig. 1A).

Fig. 1 Mutation counts in AA-UTUCs and other group 1 carcinogen-associated cancers.

(A) Total numbers of single-nucleotide somatic mutations in the genomes of AA-UTUC, UV-associated melanoma, and tobacco-associated lung cancer. (B) Superimposed individual tumor data points for the total numbers of nonsynonymous single-nucleotide mutations in AA-UTUC, UV-associated melanoma, tobacco-associated lung cancer, Opisthorchis viverrini (OV)–associated cholangiocarcinoma (CCA), and H. pylori–associated gastric cancer.

To confirm this strikingly high AA mutational burden, we sequenced the exomes of an additional eight matched pairs of AA-UTUC. We generated an average of ~8.8 Gb of reads for each sample, yielding an average coverage of 35X in targeted regions, with 60.3% of targeted bases represented by at least 20 sequence reads (table S7). Somatic mutations were identified using bioinformatic analysis similar to the WGS analysis. We identified 9933 exonic somatic substitutions in 6415 different genes in the nine AA-UTUCs, including 8144 missense, 853 nonsense alterations, and 936 splice-site mutations. A thousand of these exonic mutations (at least 100 candidate somatic coding mutations per tumor) were selected for Sanger sequencing verification. Thirty mutations could not be tested because of polymerase chain reaction (PCR) failure, and of the remaining 970 mutations, 946 were confirmed by Sanger sequencing (97.5%, table S8). On average, 1118 nonsynonymous somatic mutations were identified for each AA-UTUC exome (904 missense, 94 nonsense, 104 splice site). We then compared the AA-UTUC mutation rate against those observed in other cancers, including (i) UV-exposed melanomas (n = 7) (20), (ii) smoking-associated lung cancers (n = 12) (21), (iii) liver fluke–induced bile duct cancers (n = 8) (2), and (iv) Helicobacter pylori–associated gastric cancers (n = 15) (1). We chose these four cancer types because they are also caused by group 1 carcinogens. Again, AA-UTUC exhibited the highest rate of exonic mutations (Fig. 1B and fig. S1). None of the AA-UTUC cases exhibited evidence of microsatellite instability (MSI; table S1), indicating that the high mutational load in AA-UTUC is not due to a classical MSI mechanism.

Numerous genes were affected by somatic mutations in two or more of the nine tumors, an unsurprising result given the exceedingly high AA-UTUC mutation rate. Frequently mutated genes in AA-UTUC included TP53 (mutated in five of nine tumors) (13, 15) and the histone H3K27 demethylase KDM6A (eight of nine tumors). We also observed frequent mutations in other chromatin modifier genes such as ARID1A and SETX (Table 1). We did not observe mutations in the oncogenes HRAS or FGFR3, previously described to be mutated in AA-UTUC, which may be due to the relatively low mutation rates reported for these genes [4.7 and 4%, respectively (13)].

Table 1 The top 15 recurrent mutated genes in AA-UTUC.
View this table:

AA induces a characteristic mutational signature in cancer

The mutational effects of AA are thought to involve AA metabolites that bind to the amino groups of purine bases (A and G) to form DNA adducts (22, 23). In AA-UTUC, we observed a strikingly high proportion of somatic A:T to T:A mutations—a mutation class that constitutes only a minor proportion in other cancers, including H. pylori–associated gastric cancer and liver fluke–induced bile duct cancer (Fig. 2A) (1, 2, 4, 2426). We did not observe high rates of mutations at guanine bases, suggesting that DNA adducts formed by AA primarily affect adenine sites in vivo.

Fig. 2 AA-UTUC mutational signatures.

(A) Numbers of mutations in each of six possible mutation classes in the whole genome and exomes of AA-UTUCs and exomes of non–AA-associated cancers (H. pylori–associated gastric cancer and O. viverrini–associated CCA). (B) Sequence contexts of A>T, CAG>CTG, and TAG>TTG mutations in the whole genome and exomes. Mutation rates are expressed as fractions of the counts of the given triplet or quintuplet. For example, the rate of TC[A>T]GA mutations is the rate per million TCAGA quintuplets. (C) Strand bias: There are about twice as many A>T mutations on the sense (s; nontranscribed) strand as on the antisense (a; transcribed) strand in AA-UTUC.

To better understand AA mutagenesis, we examined the sequence contexts of the A:T to T:A mutations. Collectively analyzing 267,192 A-to-T transversions in the genome sequence of tumor 9T, we observed a dramatic overrepresentation of cytosines and thymines immediately 5′ to mutated adenines (that is, [C|T]A) and overrepresentation of guanines 3′ to mutated adenines (that is, AG) (Fig. 2B). Indeed, TAG and CAG were the most prevalent among the 16 possible A-to-T transversion trinucleotide groups (Fig. 2B, fig. S2, A and B, and tables S9 and S10). In contrast, enrichment of the C|TAG motif at A-to-T mutations was not observed in non–AA-associated cancers (Fig. 2B), indicating that this motif is specific to AA-induced mutagenesis. Our data also suggest that, besides the immediate flanking bases (that is, +/−1), nucleotides at the +/−2 positions may also influence AA mutational selectivity. Specifically, we observed overrepresentation of adenine 5′ to TA and CA (that is, ATA or ACA), whereas guanine was the dominant base 3′ to AG (that is, AGG) (Fig. 2B and tables S11 to S14). Together, these results point to A[C|T]AGG as the consensus mutational hotspot for AA-induced A:T to T:A mutations. The same sequence motif was identified within the nine AA-UTUC exomes (Fig. 2B and fig. S3).

When we compared the total number of somatic mutations occurring on nontranscribed versus transcribed strands, we discovered over twice as many A-to-T transversions on the nontranscribed strand (Fig. 2C). This strand bias suggests that AA-UTUC is similar to some other cancer types (20, 27) where lesions occurring on transcribed strands may be identified and corrected by transcription coupled repair.

Finally, 7.4% of A:T to T:A transversions across the nine AA-UTUC exomes occurred at CAG trinucleotides at 3′ splice sites. The AA-UTUC mutations at 3′ splice-site CAGs occurred at significantly higher prevalence than expected by chance (Fig. 3A and table S15; P = 6.6 × 10−9 to 0.015 by hypergeometric tests). Similar 3′ splice-site enrichments were not observed in non–AA-associated cancers (Fig. 3A and table S15; P = 0.21 and 0.76, respectively, by hypergeometric tests). These findings suggest that splice-site mutations may contribute to the pathogenesis of AA-UTUC.

Fig. 3 Altered RNA splicing in AA-UTUCs.

(A) AA-UTUC and AA-treated cells have more mutated CAGs at 3′ splice sites than expected by chance (P values in table S15). The height of the black bar labeled “All CAGs” shows the percentage of all CAGs (both mutated and wild-type) occurring at 3′ splice sites. (B) mRNA levels of NMD machinery genes in AA-UTUC and non–AA-UTUC quantified by quantitative RT-PCR. Results were analyzed by one-sided Wilcoxon rank-sum tests for each gene. The P values are identical because, for every gene, the mRNA levels in all the AA-UTUCs rank higher than in all the non–AA-UTUCs. (C) A heterozygous splice-site mutation resulted in skipping of MBOAT7 exon 3 in AA-UTUC. Bridging reads, confirming the exon skipping, are shown. (D) Verification of MBOAT7 exon skipping by RT-PCR. Arrows schematically indicate primer locations.

Aberrant mRNA splicing is expected to result in defective transcripts and activation of the nonsense-mediated decay (NMD) machinery (28, 29). Indeed, quantitative reverse transcription PCR (RT-PCR) showed significantly increased expression of four NMD genes in the nine AA-UTUC (P = 0.00202 for each gene, one-sided Wilcoxon rank-sum tests; Fig. 3B). In addition, RNA-seq analysis of one tumor-normal pair revealed up-regulation of 13 of 15 NMD pathway genes previously reported to be up-regulated in response to aberrant splicing (26) (P = 0.002, Wilcoxon signed-rank test; fig. S4 and table S16). In this same tumor, RNA-seq analysis of genes with CAG to CTG mutations at 3′ splice sites also revealed multiple examples of atypical splicing, including intron inclusion and exon skipping. For example, somatic splice-site mutations in both MBOAT7 and RFC2 caused production of transcripts skipping the exon 3′ to the mutation (Fig. 3, C and D, and fig. S5). Indeed, of 11 genes with 3′ splice-site mutations and with adequate read depth, 10 showed altered splicing (table S17). By contrast, we observed no altered splicing of 20 genes at similar read coverage with wild-type splice sites (table S18). This difference is significant at P < 3 × 10−7 (Fisher’s exact test, two-sided; fig. S6, A to C). Although these results remain to be confirmed in a larger series, they provide additional support for an important role for splice-site mutations in AA-UTUC.

AA is sufficient to recapitulate Aristolochia-induced mutagenicity and nephrotoxicity

Human consumption of AA typically occurs via herbal remedies (9) containing AA and mixtures of other natural products. We sought to experimentally verify that AA alone, as a purified isolated compound, is sufficient to cause both renal pathology and the mutational signature observed in AA-UTUC primary tumors. We first established in vitro models by chronic treatment of human renal proximal tubular cells (HK2) with sublethal levels of AA (10 μM) for 6 months. Eighty-five percent of HK2 cells underwent apoptosis and/or necrosis upon AA treatment, but subsequently, single clones developed and proliferated. We randomly selected two independent AA-treated HK2 clones and their untreated parental clones for exome sequencing (31X to 34X coverage) and analysis (table S19). Sanger validation of somatic variants yielded a specificity of 95.7% (89 of 93 mutations; table S20). We identified 219 and 162 somatic substitutions in the two clones (293 missense, 31 nonsense, and 57 splice-site mutations). The mutational signature in the in vitro AA-treated HK2 clones was similar to that observed in primary AA-UTUCs, including a predominance of A:T to T:A transversions on nontranscribed strands (Fig. 4A), a [C|T]AG mutational motif (Fig. 4B), and an enrichment of splice-site mutations (Fig. 3A). These in vitro results demonstrated that AA alone is sufficient to recapitulate the repertoire of genomic hallmarks observed in human AA-UTUC.

Fig. 4 In vitro and in vivo models of AA-induced damage.

(A) After AA treatment for 6 months, HK2 cells developed clones with numerous mutations that, like AA-UTUC tumors, show a predominance of A>T substitutions and strand bias. (B) Sequence contexts for A>T mutations in the exome of the AA-exposed HK2 clone. Rates of mutations are expressed as fractions of the exome-wide counts of occurrence of the given triplet. (C) Representative nontreated and AA-treated mouse kidneys. (D) Nontreated versus AA-treated mouse kidney (hematoxylin and eosin staining, ×400; scale bars, 25 μm). At day 10, there was a dramatic accumulation of proteinaceous fluid within the tubules of the AA-treated kidney (*). At days 30 and 90, the tubular epithelial cells were necrotic and had collapsed, leaving the tubular basement membrane naked (#). There were few signs of regeneration. The glomeruli demonstrated ischemic shrinkage (arrowhead). At day 90, local inflammatory infiltration (arrow) was evident in the interstitium, and urothelial dysplasia had developed.

We also studied AA toxicity in vivo. Three-day oral dosing of C57BL/6 mice with AA (50 mg/kg) induced a range of renal pathologies over a 90-day posttreatment period. These renal abnormalities were highly similar to previously reported Aristolochia-associated pathologies in humans (9, 30), including organ atrophy, tubular necrosis, and lymphocytic infiltrates in the deep cortex, outer medulla, and medullary rays (Fig. 4, C and D, and fig. S7, A to D). We also observed accumulation of proteinaceous fluid, cellular debris, and desquamation of tubular epithelial cells into the proximal and distal tubules (Fig. 4D). Urothelial dysplasia within the renal pelvis was evident in AA-treated mice at days 30 and 90 (Fig. 4D), indicating that exposure to AA can cause abnormal cell growth. It is likely that UTUC would be seen in this model with prolonged exposure to AA and longer follow-up.

Genomic screening reveals AA-like mutational signatures in HCC

AA’s contributions to UTUC have been documented, but little is known about AA’s possible roles in other cancer types. We surmised that the AA mutational signature could be used as a screening tool to study the potential involvement of AA in other malignances, particularly in other organs affected by AA toxicity. As a proof of concept, we studied liver cancer (HCC). We screened genome or exome sequence data from 93 hepatitis B virus (HBV)–positive HCCs (88 previously published) (31). We detected 11 cases exhibiting strong AA (or AA-like) mutational signatures using the following criteria: (i) A > T is the most frequent mutation category or there are ≥10 exonic A:T to T:A transversions; (ii) ≥33% of the A:T to T:A transversions are associated with a C|TAG sequence context; and (iii) a strand bias of ≥1.25:1. An additional 19 cases exhibited a weak signature, defined as 30% of the A:T to T:A transversions being either CAG or TAG (Fig. 5 and figs. S8 and S9). The remaining 63 HCCs did not exhibit these AA mutational features (Fig. 5). These results suggest a probable exposure of 11 patients to AA (or an AA-like compound) during HCC carcinogenesis. Although definitive evidence of AA exposure in these patients is currently unavailable, these results suggest a potential role for AA in a subset of HCCs.

Fig. 5 AA mutational signatures in HCC.

(A) Numbers of mutations in each of six possible mutation classes in 11 probable AA-exposed HCCs and 63 non–AA-HCC exomes. HCCs exhibiting a weak AA signature are presented in fig. S9. (B) Sequence contexts for A>T mutations in the exomes of the probable AA-exposed and non–AA-HCCs. Mutation rates are expressed as fractions of the counts of the given triplet. For example, the rate of C[A>T]G mutations is the rate per million CAG triplets. (C) Strand bias: There are about twice as many A>T mutations on the sense (s; nontranscribed) strand as on the antisense (a; transcribed) strand of the probable AA-exposed HCCs.


Exposures to exogenous carcinogens are known to leave their imprints as mutational patterns in cancer genomes. Characterizing these signatures not only sheds light on the molecular mechanism of carcinogenesis but may also provide a genomic tool to detect a carcinogen’s involvement in cancers not previously known to be linked to the carcinogen. To date, mutational signatures in cancer have been reported for some group 1 carcinogens, including UV light, liver flukes, tobacco, and H. pylori (14, 27, 32). Here, we identified a highly distinctive genome-wide mutational signature associated with AA, a group 1 carcinogen associated with UTUC. Compared with other group 1 carcinogens studied to date on a genome-wide scale, AA causes the highest rate of mutations. The high mutation load observed in AA-UTUC is comparable to those observed in “hypermutated” cancers caused by mismatch repair defects (33) but lower than those observed in “ultramutated” cancers caused by inactivation of the DNA polymerases POLE or POLD1 (34) (table S21). The high level of de novo mutations identified in AA-UTUC may explain the epidemiological observation that AA-UTUC patients tend to be younger and have higher contralateral recurrence rates (16) than non–AA-UTUC patients.

Beyond the sheer abundance of mutations, AA mutational events were characterized by a predominance of A:T to T:A transversions in an A[C|T]AGG context. Furthermore, AA mutations affected CAG trinucleotides at 3′ splice sites more often than expected by chance, suggesting a role for splice-site mutations in UTUC pathogenesis. This is supported by our demonstration of up-regulation of the NMD machinery and altered splicing patterns in AA-UTUC. This process, coupled with specific pro-oncogenic mutations in cancer-related driver genes (Table 1), is likely to contribute to AA-UTUC carcinogenesis. In the nine AA-UTUCs, we identified numerous genes mutated at frequencies comparable to TP53, the gene previously reported to be highly mutated in AA-UTUC (15). The most frequently mutated gene in our cohort of AA-UTUCs was KDM6A, a histone demethylase gene. KDM6A mutations have been previously observed in different cancer types, but most notably in bladder cancer, a malignancy of transitional cells that are also predominantly found in AA-UTUC (25). Besides KDM6A, we also observed mutations in other chromatin modifying genes in AA-UTUC (Table 1). To date, very little is known about how these genes, which have been recently found to be frequently mutated in cancer, contribute to tumorigenesis (35). Thus, further functional studies of these genes to elucidate their tumorigenic roles are warranted.

Several cellular pathways exist for repairing DNA lesions caused by carcinogens (36). Of these, nucleotide excision repair (NER) serves as the predominant DNA repair pathway for repairing bulky DNA adducts, involving both global genomic NER (GG-NER) and transcription-coupled NER (TC-NER) (37, 38). The strand bias that we observed in this study suggests that AA-adducts occurring on the transcribed strand may be removed and corrected by TC-NER, but not by GG-NER. AA-associated adducts have been shown to persist in human tissues for many years (39). This is in accordance with recent evidence demonstrating that AA-associated adducts can escape GG-NER because they are not detected by the XPC/RAD23B recognition complex, the initial activating step for GG-NER (40). Our data suggest that whereas TC-NER may function in some part to correct a proportion of AA-adducts, it appears to be overwhelmed by the sheer abundance of somatic mutations caused by AA exposure.

Kidneys and livers play essential roles in drug and toxin metabolism. In the case of AA, cytosolic NAD(P)H:quinone oxidoreductases expressed by these two organs can activate AA by nitroreduction to N-hydroxyaristolactam, forming a cyclic N-acylnitrenium ion as the ultimate carcinogenic species binding DNA (41). AA-DNA adducts have been detected in the stomach, kidney, urinary tract, bladder, and liver (11), suggesting a plausible carcinogenic role of AA in these organs. Given the highly distinctive genome-wide mutational signature of AA, we tested whether the AA signature could be used as a “molecular fingerprint” to screen for the potential involvement of AA in other cancers besides UTUC. Specifically, we interrogated cancer genome data for 93 HCCs, and found evidence for AA-like mutational signatures in 11 cases. In these HCCs, it is possible that AA-induced mutations may have collaborated with other oncogenic insults, such as HBV infection, to cause the development of cancer—whether these interactions are additive or synergistic warrants further investigation.

Our study has certain limitations. First, the comparisons of mutation rates in UV-, tobacco-, and AA-associated cancers were performed by comparing our rates for AA-associated cancer to published rates for the other cancers (3, 4), rather than reanalyzing the raw sequencing reads using identical variant detection pipelines. Nevertheless, our sequencing data and analysis pipeline are highly similar to those used by these other studies, and for the same tumor type, our pipeline’s mutation identification rate is comparable to those independently published by other groups (1, 42). The high Sanger validation rate of our study (97 to 98% based on 250 WGS-called mutations and 970 exome-called mutations) also indicates that our pipeline does not have a high false-positive rate. Second, the high mutation rates of AA-UTUC and the limited number of samples make it highly challenging to define a list of statistically significant mutated driver genes in this cancer. Nevertheless, we presented Table 1 showing the most frequently mutated genes that need to be further studied and evaluated. Third, although AA-like mutational signatures were observed in some HCCs, exposure of these patients to AA is probable but not proven. For HCC patients, herbal supplement consumption is rarely recorded in the clinical record, and AA dosages used in different herbal remedies are effectively unquantifiable. As such, areas for future studies include obtaining definitive evidence for a causative role of AA in these particular cases, and correlating these cases to other clinical factors such as disease outcome and treatment response.

Recent studies have proposed a strong dose-effect relationship between AA exposure, nephrotoxicity, and eventual cancer over a long latency period (43). In humans, the most likely cause for these exposures is prolonged herbal intake resulting in chronic carcinogenic exposure to AA. Because little is known about the specific concentrations of AA in different herbal remedies, a more reliable method of determining extent of AA exposure in individuals involves the direct detection of AA-DNA adducts in target tissues. Different techniques including 32P-postlabeling (44) and high-performance liquid chromatography coupled with mass spectrometry (45) have been used to detect such adducts. However, these approaches have not been developed systematically as a routine diagnostic tool for detecting AA exposure because the sensitivity of these assays is highly dependent on the specific dosage, duration of exposure, and persistence of AA-DNA adducts at the time of measurement. By contrast, although cancer genome sequencing does not detect AA-DNA adducts per se, such genomic information carries the potential of reflecting the long-term cumulative mutational consequences of AA exposure. Essentially, our findings provide a proof of concept that it is technically feasible to use mutational signatures associated with a carcinogen to screen for its involvement in cancers not previously known to be caused by that carcinogen. This general approach could be further extended to all group 1 carcinogens to investigate their involvement in any cancer type for which high-throughput sequencing data are available. Moreover, it may also be possible to extend such “mutational fingerprinting” to noncancer tissues such as blood cells, which are easily accessible for screening. This general strategy may thus play an important role in the field of cancer prevention as a genomic strategy to detect early carcinogen exposures.

Nearly 80% of the world population relies on traditional medicines for their primary health care needs (46). Despite public health warnings regarding the safety of botanical products and dietary supplements containing AA and bans on the use of such products in several countries, almost 20 years later potential sources of AA remain available. For example, certain AA-containing products are still permitted under the supervision of practitioners of Chinese medicine (43), and products containing AA are still easily available worldwide over the Internet (47, 48). Other sources of AA include contamination of wheat flour by the seeds of Aristolochia species, as reported in the Balkans (49). Frequent intake of AAs, sometimes at very high concentrations, has been reported in studies from China, Taiwan, Hong Kong, Japan, and some Western countries (50). Thus, the strikingly high mutational burden and toxicity of AA as demonstrated in this study and another independent report (51) highlights the importance of knowing the contents of herbal products and of greater public awareness of the fact that not all derivatives from “natural” plants are safe.

Materials and Methods

Study design

Supplementary Materials and Methods provides details of the study design.

Clinical samples and information

Tissue samples and clinical information on the subjects with AA-UTUC were obtained from the Chang Gung Memorial Hospital in Taiwan. Written consent was obtained from each subject, and the research protocol was approved by the Human Research Ethics Committee of the Chang Gung Memorial Hospital.


The genomes of the AA-UTUCs and the matched nonmalignant tissue sample were sequenced on an Illumina HiSeq 2000 as paired-end 76-bp reads. The exomes of nine AA-UTUCs and matched normal adjacent tissues, the HK2 cell line (from American Type Culture Collection), and two AA-treated clones of HK2 cells were sequenced on an Illumina GAIIx sequencer (2 × 76-bp reads). Read pairs were aligned to the reference human genome (hg19) using Burrows-Wheeler Aligner. Somatic single-nucleotide mutations were identified according to their presence in the tumor genome and absence from the corresponding normal genome as described in the text and Supplementary Materials. RNA sequencing was carried out on an Illumina HiSeq 2000 and analyzed as described in Supplementary Materials and Methods.

Analysis of NMD genes

Supplementary Materials and Methods provides details of the analysis of expression levels of nonsense-mediated decay genes.

Induction of AA-associated nephropathy in C57BL/6 mice

C57BL/6 mice were treated with AA (50 mg/kg) for 3 days and sacrificed on days 10, 30, and 90 to monitor AA-induced nephropathy. The procedures for the present study were approved by the Animal Committee under Singapore Health Science (SingHealth), and all animals were treated according to the guidelines for animal experimentation of SingHealth (Institutional Animal Care and Use Committee protocol #2012/SHS/773).

Statistical and bioinformatic analyses

Supplementary Materials and Methods provides details of bioinformatic and statistical analyses.

Supplementary Materials

Materials and Methods

Fig. S1. Frequency of mutations in carcinogen-induced cancers.

Fig. S2. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the AA-UTUC whole genome.

Fig. S3. Single-nucleotide somatic mutations in 16 possible sequence contexts for A>T transitions in the exomes of nine AA-UTUCs.

Fig. S4. Systematic up-regulation of NMD gene transcripts in AA-UTUC compared to adjacent normal tissue.

Fig. S5. A heterozygous 3′ splice-site mutation results in skipping of RFC2 exon 10 in AA-UTUC.

Fig. S6. Strong association between CAG>CTG mutations at 3′ splice sites and altered splicing.

Fig. S7. Further details of the in vivo model of AA-induced damage.

Fig. S8. Superimposed individual tumor data points for the total nonsynonymous single-nucleotide variants and each of the separate mutation types in AA-HCCs and non–AA-HCCs.

Fig. S9. Nineteen HCCs exhibiting a “weak” AA mutational signature.

Fig. S10. Schematic representation of 3′ splice-site CAGs.

Table S1. Clinical characteristics of AA-UTUC patients analyzed by whole-genome and/or exome sequencing.

Table S2. Sequence analysis summary of whole genome–sequenced AA-UTUC (9T).

Table S3. Breakdown of somatic mutations by genomic region.

Table S4. Somatic nonsynonymous substitutions in protein-coding genes of the whole genome–sequenced AA-UTUC.

Table S5. Somatic substitutions in unspliced transcript regions (transcribed, including introns and untranslated regions) of AA-UTUC (9T).

Table S6. Somatic substitutions in the intergenic regions of AA-UTUC (9T).

Table S7. Sequence analysis summary of nine exome-sequenced AA-UTUCs.

Table S8. Somatic nonsynonymous substitutions in protein-coding genes of nine AA-UTUCs.

Table S9. The effect of +/− one base flanking the mutated adenine or thymidine on the number of unspliced transcript (transcribed region, including introns) mutations in AA-UTUC.

Table S10. The effect of +/− one base flanking the mutated adenine or thymidine on the number of intergenic mutations in AA-UTUC.

Table S11. The effect of +/− two bases flanking the mutated adenine or thymidine on the number of unspliced transcript (transcribed region, including introns) mutations in AA-UTUC.

Table S12. The effect of +/− two bases flanking the mutated adenine or thymidine on the number of intergenic mutations in AA-UTUC.

Table S13. The effect of +/− one base flanking the mutated TAG on the rates of unspliced transcript (transcribed regions, including intron) mutations in AA-UTUC.

Table S14. The effect of +/− one base flanking the mutated CAG on the rates of unspliced transcript (transcribed regions, including intron) mutations in AA-UTUC.

Table S15. Hypergeometric analysis for enrichment of CAG splice-site mutations in AA-UTUCs, AA-treated HK2 clones, and non–AA-associated cancers.

Table S16. RPKM gene expression values for 15 NMD pathway genes in the AA-UTUC and matched normal tissue.

Table S17. Identities of 3′ splice sites with CAG>CTG mutations and RPKM > 2.

Table S18. 3′ splice sites without CAG>CTG mutations for evaluating the proportion of unmutated sites associated with aberrant splicing.

Table S19. Sequence analysis summary of two exome-sequenced AA-treated HK2 clones.

Table S20. Somatic nonsynonymous substitutions in protein-coding genes of AA-treated HK2 clones.

Table S21. Comparison of mutation rates in AA-UTUC, carcinogen-induced cancers, mismatch repair–defective colorectal cancers, and POLE/POLD1 mutated colorectal cancers.

Table S22. Primer sequences.

References and Notes

  1. Funding: This work was supported in part by funding from the Singapore National Medical Research Council (NMRC/STAR/0006/2009), the Singapore Millennium Foundation, the Lee Foundation, the Singapore National Cancer Centre Research Fund, The Verdant Foundation, the Duke-NUS Graduate Medical School, the Cancer Science Institute, Singapore, the Chang Gung Memorial Hospital (CMRPG370471-3), the Taiwan National Science Council (98-2314-B-182A-057-MY3), and Wellcome Trust (grant reference 098051). W.Y. and L.D.L. are the recipients of the NUS Graduate School for Integrative Sciences and Engineering Scholarship, Singapore. S.G.R. received startup funds from the Singapore Ministry of Health and Agency for Science, Technology, and Research via Duke-NUS Graduate Medical School. Author contributions: S.L.P., J.R.M., S.-T.P., S.G.R., P.T., and B.T.T. conceived the study. S.-T.P., S.G.R., P.T., and B.T.T. directed the study. W.-H.W., Y-H.C., C.-K.C., K.-J.Y., K.-F.N., C.-F.W., and C.-L.H. were involved in the specimen collection, pathological reviews, and clinical data collections. S.L.P., E.Y.S., S.C.C., and R.M.B. were involved in establishment of the in vitro and in vivo models of AA-induced damage. J.R.M., W.Y., K.K.H., W.K.L., P.G., Y.L., I.C., M.R.S., P.A.F., W.-K.S., and S.G.R. performed the bioinformatics data analysis. S.L.P., H.L.H., A.G., S.T.T., D.H., L.D.L., M.-L.N., W.C.-o., and C.K.O. performed sequencing and quantitative RT-PCR experiments. S.L.P., J.R.M., S.-T.P., S.G.R., P.T., and B.T.T. wrote the manuscript, with the assistance and final approval of all authors. Competing interests: The authors declare that they have no competing interests. Data and materials availability: The data for this study have been deposited in the European Nucleotide Archive under accession number PRJEB4138. Correspondence and requests for materials should be addressed to B.T.T., P.T., S.G.R., or S-.T.P.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article