Research ArticleLeukemia

Clonal Evolution of Preleukemic Hematopoietic Stem Cells Precedes Human Acute Myeloid Leukemia

See allHide authors and affiliations

Science Translational Medicine  29 Aug 2012:
Vol. 4, Issue 149, pp. 149ra118
DOI: 10.1126/scitranslmed.3004315


Given that most bone marrow cells are short-lived, the accumulation of multiple leukemogenic mutations in a single clonal lineage has been difficult to explain. We propose that serial acquisition of mutations occurs in self-renewing hematopoietic stem cells (HSCs). We investigated this model through genomic analysis of HSCs from six patients with de novo acute myeloid leukemia (AML). Using exome sequencing, we identified mutations present in individual AML patients harboring the FLT3-ITD (internal tandem duplication) mutation. We then screened the residual HSCs and detected some of these mutations including mutations in the NPM1, TET2, and SMC1A genes. Finally, through single-cell analysis, we determined that a clonal progression of multiple mutations occurred in the HSCs of some AML patients. These preleukemic HSCs suggest the clonal evolution of AML genomes from founder mutations, revealing a potential mechanism contributing to relapse. Such preleukemic HSCs may constitute a cellular reservoir that should be targeted therapeutically for more durable remissions.


Acute myeloid leukemia (AML) is an aggressive malignancy of hematopoietic progenitor cells with a poor clinical outcome (1, 2). A number of cytogenetic and molecular abnormalities have been identified in AML, many of which have been found to be prognostic. For example, internal tandem duplications in the FLT3 tyrosine kinase (FLT3-ITD) occur frequently in AML and define a relatively high-risk subgroup (3). Recently, next-generation DNA sequencing has been used to describe AML genomes (4, 5) and to identify new recurrent mutations (6, 7). Despite these advances, our fundamental understanding of both the genomics of leukemogenesis and the genetic heterogeneity within the precursor cells is incomplete.

Initiating mutations in most AML cases are largely unknown because preleukemic cells are clinically silent and are outcompeted by their malignant descendants (8). Our limited knowledge of initiating mutations comes from infrequent cases of AML arising secondary to antecedent clonal bone marrow disorders or rare instances of inherited syndromes, but this does not include the large majority of de novo AML cases.

Beyond the first mutation, the requirement for multiple coding mutations to generate AML raises the question of how these mutations can accumulate in a single clone given the low rate of spontaneous mutation in hematopoietic cells and the lack of global genome instability in leukemia. We propose a model in which serial mutations and/or epigenetic events must accumulate in self-renewing hematopoietic stem cells (HSCs), unless a mutation confers self-renewal ability on a downstream cell given that mutations occurring in non–self-renewing cells will be lost (9). Early evidence consistent with this model comes from our previous investigation of AML cases associated with the AML1-ETO translocation, where LinCD34+CD38CD90+ HSCs isolated from patients in long-term remission up to 150 months after therapy produced normal myeloid colonies in methylcellulose in vitro clonal progenitor cell assays and not leukemic blasts, yet contained detectable AML1-ETO transcripts (10). This was followed by our studies of chronic myeloid leukemia where t(9;22) resulting in the BCR-ABL translocation occurred in HSCs, yielding chronic phase disease, yet progression to blast crisis was accompanied by additional mutations resulting in the formation of leukemia stem cells (LSCs) at the granulocyte-monocyte progenitor stage (11, 12). Preleukemic HSCs were also observed in one twin pair with discordant TEL-AML1 pediatric acute lymphoblastic leukemia (ALL) (13), and additional studies of myelodysplastic syndrome (MDS) demonstrated sequential acquisition of chromosomal abnormalities in the HSC compartment of some patients (14). These studies investigated subtypes of AML/ALL/MDS defined by karyotypic abnormalities as clonal genomic events and predate recent advances in cancer genome sequencing and mutation analysis. However, in the current era of cancer genome sequencing, the extent to which clonal evolution occurs in a preleukemic HSC population remains unknown.

Here, we investigated this model and the nature of initiating mutations through the genomic and functional analysis of de novo AML and patient-matched residual HSCs at the single-cell level.


Advances in AML stem cell biology have enabled the prospective separation of residual HSCs from AML cells (1519). We reported the prospective separation of residual HSCs from primary AML cells in diagnostic AML patient samples based on the differential expression of the cell surface proteins CD47 and TIM3 (15, 18). Six of these samples were normal karyotype AML harboring an FLT3-ITD mutation, a large relatively high-risk AML subgroup (clinical features are presented in Materials and Methods). Rare residual HSCs in the LinCD34+CD38 fraction were isolated from these samples by fluorescence-activated cell sorting (FACS) on the basis of their lack of expression of the AML marker TIM3 and/or a new marker CD99 (Fig. 1A and fig. S1). The function of these cells was assessed by transplantation into immunodeficient nonobese diabetic/severe combined immunodeficient/interleukin-2 receptor-γ null (NSG) mice, which resulted in long-term engraftment with both CD33+ myeloid and CD19+ lymphoid human cells, thereby establishing these cells as nonleukemic HSCs (Fig. 1B and fig. S2). Molecular analysis determined that these isolated cells, as well as their in vivo progeny, did not contain the FLT3-ITD mutation, further indicating that they were not part of the fully leukemic clone (Fig. 1C and fig. S3). However, even though they appeared to be functionally normal, we hypothesized that these residual HSCs might in fact constitute a reservoir of preleukemic HSCs harboring founder mutations but lacking the complete complement of abnormalities required to generate AML.

Fig. 1

Prospective separation of residual HSCs from leukemia cells. (A) Flow cytometry analysis of samples from AML cases SU008 and SU048 indicating lineage-negative cells (left panels) and LinCD34+CD38 subsets (right panels) further analyzed for expression of CD99 and/or TIM3. Leukemia cells (red gate) and putative residual HSCs (blue gate) were each purified via two rounds of FACS (fig. S1). (B) LinCD34+CD38CD99 cells (6000 cells) isolated from case SU008 or LinCD34+CD38TIM3CD99 cells (450 cells) isolated from case SU048 were transplanted into newborn NSG mice. Twelve weeks later, bone marrow engraftment was analyzed by flow cytometry for the presence of human hematopoietic cells (hCD45+) (left panel), further subdivided into lymphoid (CD19+) and myeloid (CD33+) subsets (right panel). (C) The indicated cells were further analyzed for the presence of the FLT3-ITD mutation by polymerase chain reaction (PCR). Leukemia cells and HSCs from each case corresponded to the red and blue gates in (A), respectively, and the analysis also included bone marrow from mice engrafted with the residual HSCs. WT, wild type. (D) Experimental scheme for identification of preleukemic mutations and clonal evolution in de novo AML.

To identify mutations present in each AML sample, we used targeted exome sequencing (20, 21) of FACS-enriched AML cells and patient-matched CD3+ T cells (see Fig. 1D for experimental scheme). Because most of T cell development occurs in childhood, T cells have previously been used as a germline control for the discovery of somatic mutations in adult hematologic malignancies (22, 23). AML and corresponding T cell exomes were sequenced with paired-end reads at a median depth of 67- to 239-fold coverage (fig. S4). The resulting sequencing data were analyzed for leukemia-specific mutations with single-nucleotide polymorphism (SNP) variant calling algorithms from two independent sources and a separate algorithm for the identification of insertions or deletions (Indels). Analysis of these results identified high-confidence somatic variations that were then validated by Sanger resequencing of genomic DNA (gDNA). In addition, expression of mutant alleles was assessed by transcriptome sequencing of the leukemic cells. This analysis identified 3 to 19 variations in each of these samples that led to protein coding or 5′ or 3′ untranslated region variants (table S1), similar numbers as reported for other AML genomes (4, 5). The 57 mutated genes of varied function and expression included FLT3, IDH1, NPM1, and TET2, which are recurrently mutated in AML (table S1).

Notably, from these six patients, we identified two cases harboring mutations, R96H and R711G, in SMC1A, a gene that has recently been reported to be recurrently mutated in AML at a low frequency (24). Somatic mutations in SMC1A have been described in several cases of colorectal cancer (25). Germline mutations in SMC1A, including one reported case of R711 substitution, are causative in the congenital Cornelia de Lange syndrome (26), indicating the functional role of SMC1A mutations in human disease. We conducted targeted sequencing of this locus in 120 additional AML cases and identified one further SMC1A mutation, identical to the R96H mutation detected in our original cohort. In total, we identified infrequent (3 of 125 cases) but recurrent mutations in SMC1A (fig. S5). SMC1A is now the second cohesin complex component to be recurrently mutated in AML, because SMC3 mutations were reported in 6 of 200 cases elsewhere (27).

AML cells and the population of residual HSCs from each patient were analyzed for leukemia-associated mutations by targeted resequencing to a median depth of 24,470 reads. Consistent with the exome sequencing results, the mutant allele frequency was between 29 and 73% for most mutations in leukemia cells, suggesting that these mutations were heterozygous (Fig. 2 and table S2). The exceptions included two hemizygous mutations on the X chromosome in case SU014 that were present in nearly 100% of leukemic cell reads in this male patient. Sequencing of defined mixtures of normal and leukemic DNA determined that the lower threshold of sensitivity of our deep sequencing assay was less than 1% variant allele (fig. S6). Notably, in five of the six cases (none in SU043), some, but not all, mutations were present in the population of residual HSCs, consistent with the presence of preleukemic cells in this population (Fig. 2 and table S2).

Fig. 2

Targeted sequencing identifies preleukemic HSCs. FACS-purified leukemia cells and residual HSCs from AML cases SU008, SU014, SU030, SU048, and SU070 were analyzed by targeted deep sequencing for the presence of patient-matched somatic mutations in genes with detectable mRNA expression identified in leukemia cells by exome and transcriptome sequencing. The percentage of mutant allele reads is indicated in each case. The dashed red line indicates the threshold of 1% for variant allele detection as determined by sequencing of defined mixtures of normal and leukemic DNA (fig. S6). Details of sequencing reads are presented in table S2.

To screen out likely passenger mutations, we focused our further investigation on mutations in genes found to be expressed by transcriptome analysis (Table 1 and table S1). Several mutations in genes recurrently mutated in AML were identified in residual HSCs, including the NPM1c mutation in SU014 (6% mutant allele frequency) (Fig. 2 and table S2). Mutations of both alleles of TET2 were identified in residual HSCs from SU048 (40 and 10%) and SU070 (48 and 48%) (fig. S7). In addition, mutations in SMC1A were found in residual HSCs from both SU014 (4%) and SU048 (27%). In case SU008, 7% of sequenced alleles of SKP2, a known oncogene involved in regulating HSC quiescence (7, 28), were mutant. In case SU070, the R339Q mutation in CTCF is present in residual HSCs (35%). Mutation of this amino acid in the third DNA binding zinc finger domain of CTCF has been previously described in Wilms tumor and alters DNA binding specificity of the protein (29). In all five cases, the FLT3-ITD mutation was not detected in the residual HSCs, nor was IDH1 R132H detected in residual HSCs from SU014 (Fig. 2). In each case, the population of residual HSCs had varying allele frequencies for each mutation. However, relating the mutant allele frequency to the percentage of cells that contain each mutation may be complicated by copy number alterations or loss of heterozygosity. To determine the percentage of cells that contain each mutation, we performed single-cell experiments (see below). In summary, these results indicate that in five of the six cases of AML, a fraction of residual HSCs harbor some mutations found in the downstream leukemia cells, identifying some of these cells as preleukemic HSCs.

Table 1

Somatic variations identified in five AML patients. Recurrent mutations are underlined. Mutations in genes with gene expression >0.1 reads per kilobase of transcript per million mapped reads (RPKM) are shown. Sanger resequencing of leukemia DNA and CD3+ T cell DNA confirmed all of the listed variations as somatic mutations. RNA sequencing was performed on bulk AML cells for leukemic samples SU008, SU014, SU030, and SU048 to determine which variations were expressed at a level of RPKM >0.1.

View this table:

To formally demonstrate the presence of these mutations in functional HSCs, we measured mutant allele frequencies in human CD45+ leukocytes, including both CD33+ myeloid and CD19+ lymphoid cells, isolated from NSG mice transplanted with the FACS-purified residual HSCs from cases SU008, SU048, and SU070. In the case of SU008, human CD45+ cells isolated from the bone marrow of three mice 10 to 12 weeks after transplantation, representing progeny of engrafted HSCs (30), were found to contain the SKP2 mutant allele (Fig. 3). Moreover, in one additional mouse, SKP2, ELP2, and PDZD3 mutations were found in lymphoid and myeloid cells. In SU048, both lymphoid and myeloid cells isolated from five engrafted mice contained the TET2 E1357stop mutation at allele frequencies between 3 and 49% (Fig. 3). A second cluster of five mutations, including mutations in the remaining TET2 allele and SMC1A, was observed in myeloid cells from four engrafted mice and lymphoid cells from two of these mice. In SU070, lymphoid and myeloid cells from three engrafted mice contained a cluster of 14 mutations, including CTCF and both alleles of TET2 (Fig. 3).

Fig. 3

Functional HSCs contain preleukemic mutations. FACS-purified hCD45+, hCD45+CD19+, and/or hCD45+CD33+ cells isolated from the bone marrow of mice engrafted with residual HSCs from AML cases SU008 (n = 4), SU048 (n = 5), and SU070 (n = 3) were analyzed by targeted deep sequencing for patient-matched somatic mutations. The percentage of mutant allele reads is indicated in each case. The dashed red line indicates the threshold of 1% for variant allele detection as determined by sequencing of defined mixtures of normal and leukemic DNA (fig. S6).

Next, we investigated the serial acquisition of mutations, and thereby clonal progression, in preleukemic HSCs by determining the presence of each mutation in single cells of this population. Single residual HSCs were clone-sorted into 96-well plates containing methylcellulose capable of supporting myeloid colony formation. Fourteen days later, gDNA was prepped from each individual colony and analyzed for the presence of each mutation with an allele-specific SNP TaqMan assay. In case SU008, 546 myeloid colonies were grown from single residual HSCs and genotyped for SKP2, ELP2, PDZD3, and CNDP1 mutations (Fig. 4, A and B, and fig. S8). Of these 546 colonies, 489 had none of the mutations, whereas 54 contained the SKP2 mutation alone. Three colonies contained SKP2, ELP2, and PDZD3 mutations but were likely not contaminating leukemic colonies because they lacked the CNDP1 mutation. These mutation frequencies mirror those determined from targeted resequencing of this population (Fig. 2), suggesting that HSCs with or without mutations in SKP2, ELP2, and/or PDZD3 do not significantly differ in their colony formation potential, similar to the case for AML1-ETO preleukemic translocations (10). These results indicate that, in the clonal evolution of SU008, the SKP2 mutation is a founding event, mutation of ELP2 and PDZD3 occurred next within this SKP2 mutant population, and the eventual dominant leukemic clone carried these preleukemic mutations and further evolved through the mutation of ISYNA1 and FLT3 (Fig. 4C). Similarly, in case SU030, 165 myeloid colonies were grown from single residual HSCs and genotyped for KCTD4 and SLC12A1 mutations (fig. S9). Of these 165 colonies, 144 contained none of the mutations, whereas 21 contained KCTD4 mutations alone, indicating that the KCTD4 mutation preceded mutations in SLC12A1 and FLT3-ITD in this AML case.

Fig. 4

Single-cell analysis identifies sequential acquisition of mutations in preleukemic HSCs. (A) The genotype of 546 myeloid colonies derived from clone-sorted single residual HSCs from case SU008 was determined by multiplexed custom TaqMan SNP assays for mutations identified by exome analysis to be present in each population. Selected assays are shown with the full data presented in fig. S8. Each colony is represented by a single dot in graphs for each mutation tested; colonies are colored according to genotype (see figure key). (B) Three-dimensional (3D) plots illustrate the genotype of each colony. In both panels, the yellow dot indicates the genotype of patient T cells, and the red dot indicates the genotype of patient leukemia cells. (C) Model for the proposed clonal evolution of AML in patient SU008.

In SU008 and SU030, preleukemic mutations were identified in genes not previously known to be mutated in AML. In cases SU048 and SU070, preleukemic mutations were identified in a number of genes including TET2, a known recurrently mutated gene (31, 32), and SMC1A, which has recently been reported to be recurrently mutated in human AML (24). In SU048, we genotyped 81 myeloid colonies derived from single residual HSCs for eight mutations identified in expressed genes (Fig. 5, A and B, and fig. S10). Of these 81 colonies, 17 contained only the nonsense mutation in TET2, 62 contained six mutations including both mutant alleles of TET2 and SMC1A, and only 2 were found to have no mutations. Thus, in this AML case, the TET2 nonsense mutation occurred first, followed by a dominant preleukemic clone with an additional five mutations including the second TET2 and the SMC1A mutations (Fig. 5C). These results mirror those observed in mouse models, where TET2 deficiency results in HSC expansion and a competitive clonal advantage (33, 34). Further clonal evolution through the acquisition of NPM1c and FLT3-ITD mutations occurred to generate the eventual AML (Fig. 5C).

Fig. 5

Single-cell analysis identifies sequential driver mutation acquisition in preleukemic HSCs. (A and D) The genotype of 81 myeloid colonies derived from clone-sorted single residual HSCs from AML case SU048 (A) or 189 colonies derived from single HSCs from AML case SU070 (D) was determined by multiplexed custom TaqMan SNP assays for mutations identified by exome analysis to be present in each population. Selected assays are shown with the full data presented in figs. S10 (SU048) and S11 (SU070). Each colony is represented by a single dot in graphs for each mutation tested; colonies are colored according to genotype (see figure key). (B and E) 3D plots illustrating the genotype of each colony from SU048 (B) and SU070 (E). In both panels, the yellow dot indicates the genotype of patient T cells, and the red dot indicates the genotype of patient leukemia cells. (C and F) Models for the proposed clonal evolution of SU048 (C) and SU070 (F).

In case SU070, 189 myeloid colonies derived from single residual HSCs were genotyped for 13 mutations. None of these colonies contained zero mutations, 2 colonies only contained three mutations including the nonsense mutation in TET2, 35 colonies additionally contained two mutations including TET2 T1884A, and 152 colonies contained the previous mutations as well as five others including CTCF (Fig. 5, D and E, and fig. S11). In this AML case, the TET2 nonsense mutation occurred first, followed by mutation of the second TET2 allele, followed by a dominant preleukemic clone with biallelic loss of TET2 and mutation of CTCF (Fig. 5F). After these mutations, the FLT3-ITD mutation occurred and is present in frank leukemia cells (Fig. 5F). As in case SU048, most residual HSCs contained biallelic mutation of TET2. In all cases, FLT3-ITD is a late event, because this mutation is absent from the preleukemic compartment, which corroborates the clinical observation that FLT3-ITD can be a secondary mutation in leukemogenesis (35).

Using flow cytometric detection of residual HSCs and results of the single-cell analysis, we can derive a subclonal portrait for AML cases SU008, SU048, and SU070 (fig. S12). In case SU008, residual HSCs comprise 0.1% of total cells. Of these cells, 90% harbored no mutations, 10% contained the SKP2 mutation alone, and <1% were mutant for SKP2, ELP2, and PDZD3. In case SU048, residual HSCs comprise 0.02% of total cells, but unlike SU008, most of these cells were preleukemic and about 75% had multiple mutations including biallelic mutations in TET2. In case SU070, residual HSCs comprise 0.04% of total cells, of which all cells tested were preleukemic and 99% of which contained biallelic mutations in TET2.


We propose a model in which serial mutations and/or epigenetic events must accumulate in self-renewing HSCs unless a mutation confers self-renewal ability on a downstream cell, because mutations occurring in non–self-renewing cells will be lost (9). Here, we provide evidence supporting this model in six cases of de novo AML through the use of advanced sequencing methods. Within the LinCD34+CD38 population of hematopoietic cells from de novo AML patients, we prospectively separated CD99 and/or TIM3 residual HSCs from leukemia cells, and we then identified preleukemic HSCs within this residual HSC population. We quantitatively assessed the fraction of total protein-altering mutations present in preleukemic HSCs compared to their leukemic descendants and further identified HSCs as a major target of somatic mutations antecedent to AML.

Specifically, we identified preleukemic mutations in residual HSCs in five of six cases of normal karyotype AML harboring an FLT3-ITD mutation. In these five cases, 32 of 51 total mutations found in leukemia cells were also present in the residual HSCs. Moreover, 7 of 13 mutations in genes recurrently mutated in AML were also present in the residual HSC population, with the notable absence of FLT3-ITD in all five cases. Consequently, through deep sequencing and single-cell genetic characterization of residual preleukemic HSCs, we established an order for the accumulation of mutations in AML. Although it may turn out that FLT3-ITD mutations can occur closer to initiation in some instances, our data show that FLT3-ITD is a secondary mutation in all five cases in this study. These data do not prove that the evolutionary path of AML includes preleukemic HSCs in all cases of de novo AML; instead, we show that this model does occur in some patients. Because the identification and prospective isolation of other tissue-specific stem cells occurs, for example, in breast, colon, and brain, a similar approach may be used to search for premalignant progression in the tissue stem cell pool in other cancers. Such efforts would complement comparisons of primary and relapsed or metastatic tumors that have elucidated later events in the clonal progression of cancer (27, 36).

Through in vivo engraftment and in vitro single-cell assays, we identified three cases with sequential preleukemic HSC subpopulations. According to the classic clonal evolution model of cancer (37), later and more genetically altered clones are increasingly dominant and more numerous than early, less altered clones. Our findings show that some AML cases follow this model, whereas others do not. In cases SU048 and SU070, preleukemic clones bearing biallelic mutations in TET2 were more numerous than clones carrying only one mutation in TET2, and this TET2 heterozygous population was in turn more numerous than HSCs lacking all preleukemic mutations. However, an opposite pattern was observed in SU008. In this case, SKP2 mutant cells expanded to occupy ~10% of the residual HSC pool, whereas descendent cells with the additional ELP2 and PDZD3 mutations only represented less than 1% of the residual HSC population. Several models may explain the relative size of these preleukemic subclones in cases that do not fit the classic clonal evolution model of cancer. First, some mutations may be passenger mutations. Second, some driver mutations may have functions independent of clonal advantage, such as impairment of differentiation. Third, because these mutations occurred at different points in time, the earlier preleukemic cells may have had more time to expand within the HSC pool than subsequent subclones. Consistent with the possibility that only some mutations confer a clonal advantage, large jumps in the number of mutations were observed, without identification of intermediate subclones. In cases SU048 and SU070, preleukemic subclones with 1 and 6 mutations and 3, 5, and 10 mutations, respectively, were found. Presumably, only the last mutation in each group confers a clonal advantage. Ultimately, the relationship between subclone size and clonal progression may be complex.

The methods used here allowed us to identify residual HSCs harboring some, but not all, of the mutations found in the subsequent AML. We termed these cells preleukemic HSCs in that they represent progeny from a time point preceding full development of AML. Additional evidence that these cells are indeed preleukemic would come from experiments demonstrating that they have an increased ability to form frank leukemia compared to HSCs lacking such mutations. The phylogenetic relationships identified here are most likely an underestimate of the true subclonal complexity of de novo AML. Although our investigation was limited to exome analysis, epigenetic, noncoding, and complex genomic events such as structural variations may contribute to pathogenesis and may help explain the relative dearth of mutations identified in case SU030. For example, we previously showed that increased expression of CD47 contributes to AML stem cell pathogenesis and clinical prognosis but find no evidence of direct CD47 mutations (18). Moreover, increased expression of CD47 can distinguish AML LSCs from residual HSCs, indicating that it is a late-occurring event (18). Our study was not designed to assess clonal divergence among leukemia-initiating cells (LICs), as recently reported for ALL (38, 39). Thus, our evidence for a linear clonal progression of preleukemic cells does not contradict evidence for complex branching phylogenetic relationships between subclones of ALL LICs (38, 39). Furthermore, we cannot detect preleukemic cells with mutations divergent from the same patient’s dominant presenting leukemic clone, which are a common cause of relapse in pediatric ALL (40). To identify divergent mutations within preleukemic subclones, techniques such as exome sequencing of single cells would be required (41). Here, we have identified the cellular and genomic path from HSCs to the dominant presenting leukemic clone. It is reasonable to postulate that after the establishment of a frankly malignant leukemic stem cell clone, continued evolution occurs, resulting in the emergence of diverse subclones of increasing malignancy and refractoriness to therapy.

Although clonal antecedents of leukemia have been difficult to study, they may prove clinically important. Indeed, some cases of relapsed pediatric ALL arise from a clone ancestral to the presenting leukemia (40). The same may be true in AML, in which relapsed disease could develop from a preleukemic HSC clone that acquires additional new mutations resulting in a genetically divergent leukemic relapse. This possibility suggests that preleukemic HSCs constitute a cellular reservoir that may need to be targeted therapeutically for more durable remissions.

Materials and Methods

Human samples

Human AML samples were obtained from patients at the Stanford Medical Center with informed consent, according to Institutional Review Board (IRB)–approved protocols (Stanford IRB nos. 76935 and 6453). In all cases, specimens were obtained before therapy. Mononuclear cells from each sample were isolated by Ficoll separation and cryopreserved in liquid nitrogen. All analyses conducted here used freshly thawed cells.

Animal care

All mouse experiments were conducted according to an Institutional Animal Care and Use Committee–approved protocol and in adherence to the National Institutes of Health Guide for the Care and Use of Laboratory Animals.

Flow cytometry analysis and cell sorting

A panel of antibodies was used for analysis and sorting of residual HSCs within the LinCD34+CD38 compartment of AML samples as previously described (15, 18). CD99 antibody clone TÜ12 (BD Pharmingen) and TIM3 antibody clone 344823 (R&D Systems) were used to discriminate marker-positive leukemia cells from marker-negative HSCs. In these cases, CD90+ cells were in the CD99 and TIM3 subsets, allowing exclusion of CD90 antibodies in these studies. The lineage consisted of CD3, CD19, and CD20. CD3 antibody clone SK7 was used to sort CD3+ T cells.

Exome sequencing

gDNA was purified from two sorted cell populations (reference from CD3+ T cells, leukemia from LinCD34+CD38CD99+ or LinCD34+CD38CD99+TIM3+ cells), and paired-end sequencing library preparation was carried out as recommended by Illumina. Targeted exome enrichment was then performed with the SeqCap EZ Exome SR kit as per the manufacturer’s instructions (Roche). For additional details, see the Supplementary Materials and Methods.

Targeted resequencing of leukemia-associated mutations

Amplicons spanning each mutation were amplified by PCR from an input of between 500 and 2000 cell equivalents of gDNA from several different populations. Paired-end sequencing library preparation with bar coding was performed as recommended by Illumina. Reads that passed all filters were assayed for the presence of the given SNP and tallied under the corresponding bar code into one of three categories: germline, “mutant” (the somatic variant discovered in leukemia cells), or “nonbiological” (the remaining two possible nucleotides). Indels were assayed for “germline” or mutant alleles. The mutant allele frequency was determined as follows: mutant read number/(germline read number + mutant read number). For additional details, see the Supplementary Materials and Methods.

NSG xenotransplantation assay

FACS-purified cells were transplanted into newborn NSG mice (Jackson Laboratory) conditioned with 1 Gy of irradiation as described (30). After 12 weeks, mice were euthanized and bone marrow was analyzed for human engraftment (hCD45+) that was further characterized for lineage on the basis of expression of myeloid (CD33+) and lymphoid (CD19+) cell surface markers. hCD45+, hCD45+CD33+, or hCD45+CD19+ cells were sorted for genetic analyses.

Single HSC-derived colony genotyping assay

Single cells were deposited into 96-well plates containing 100 μl of complete methylcellulose (MethoCult GF+ H4435; STEMCELL Technologies). For SU008, SU048, and SU070, single cells were clone-sorted by FACS. For SU030, cells were diluted into medium at a dilution of <1 colony-forming unit (CFU)/ml. Colony formation was assayed after 14 days in culture by microscopy and scored on the basis of morphology. Methylcellulose colony types were scored as follows: CFU-GEMM, colony-forming unit–granulocyte, erythrocyte, monocyte, megakaryocyte; CFU-E, colony-forming unit–erythrocyte; CFU-GM, colony-forming unit–granulocyte-macrophage. gDNA was isolated from each colony with TaqMan Sample-to-SNP kit (Applied Biosystems). Details of each Custom TaqMan SNP Genotyping Assay (Applied Biosystems) are available upon request. Multiplexed TaqMan SNP Genotyping Assays were conducted on each colony according to the manufacturer’s specifications.

Supplementary Materials

Materials and Methods

Fig. S1. Flow cytometry analysis of FACS-purified cell populations.

Fig. S2. Functional characterization of residual HSCs.

Fig. S3. FLT3 genotyping of sorted and engrafted cells.

Fig. S4. Sequencing statistics and schematic of the data analysis pipeline used for variant identification.

Fig. S5. Schematic of SMC1A indicating identified mutations.

Fig. S6. Control sequencing to determine sensitivity of targeted resequencing.

Fig. S7. TET2 mutations in SU048 and SU070 occur on separate alleles.

Fig. S8. Single-cell genotyping of residual HSCs from SU008.

Fig. S9. Single-cell analysis identifies sequential mutation acquisition in preleukemic HSCs from SU030.

Fig. S10. Single-cell genotyping of residual HSCs from SU048.

Fig. S11. Single-cell genotyping of residual HSCs from SU070.

Fig. S12. Methylcellulose colony formation from residual HSCs and subclonal portraits.

Table S1. Summary of somatic mutations identified by targeted exome sequencing and validated by Sanger sequencing.

Table S2. Summary of deep sequencing of somatic mutations in FACS-purified cell populations.

References and Notes

  1. Acknowledgments: We thank F. Zhao and S. Tseng for laboratory management; N. Neff, B. Passarelli, and G. Mantalas for sequencing assistance and expertise; and E. Piccione for critical review of the manuscript. We thank the Hematology Division Tissue Bank and the patients for donating their samples. Funding: M.J. is supported by the Lucille P. Markey Biomedical Research Fellowship and the NSF Graduate Research Fellowship. T.M.S. is supported by the Howard Hughes Medical Institute. M.R.C.-Z. is supported by the Smith Fellowship and the NSF Graduate Research Fellowship. R.M. holds a Career Award for Medical Scientists from the Burroughs Wellcome Fund and is a New York Stem Cell Foundation Robertson Investigator. This research was supported by grants from the Stinehardt-Reed Foundation (to R.M.); Howard Hughes Medical Institute (to S.R.Q.); NIH, Ellison Medical Foundation Senior Scholar Award, and Ludwig Foundation (to I.L.W.); and NIH grant U01HL099999 (to I.L.W., S.R.Q., and R.M.). Author contributions: M.J., T.M.S., M.R.C.-Z., I.L.W., S.R.Q., and R.M. designed the research and wrote the paper; M.J., T.M.S., and M.R.C.-Z. performed exome library preparation; T.M.S. performed transcriptome library preparation; M.J. and M.R.C.-Z. performed single-cell analyses; M.J., P.V., and M.R.C.-Z. performed targeted resequencing of observed mutations; T.M.S. and M.R.C.-Z. analyzed sequencing data. Competing interests: The authors declare that they have no competing interests. Data and materials availability: The data for this study are deposited in dbGaP. Requests for material should be addressed to R.M. (rmajeti{at}, S.R.Q. (quake{at}, or I.L.W. (irv{at}
View Abstract

Navigate This Article