Research ArticleImmunology and Cancer

Measurement and Clinical Monitoring of Human Lymphocyte Clonality by Massively Parallel V-D-J Pyrosequencing

See allHide authors and affiliations

Science Translational Medicine  23 Dec 2009:
Vol. 1, Issue 12, pp. 12ra23
DOI: 10.1126/scitranslmed.3000540


The complex repertoire of immune receptors generated by B and T cells enables recognition of diverse threats to the host organism. Here, we show that massively parallel DNA sequencing of rearranged immune receptor loci can provide direct detection and tracking of immune diversity and expanded clonal lymphocyte populations in physiological and pathological contexts. DNA was isolated from blood and tissue samples, a series of redundant primers was used to amplify diverse DNA rearrangements, and the resulting mixtures of bar-coded amplicons were sequenced with long-read ultradeep sequencing. Individual DNA molecules were then characterized on the basis of DNA segments that had been joined to make a functional (or nonfunctional) immune effector. Current experimental designs can accommodate up to 150 samples in a single sequence run, with the depth of sequencing sufficient to identify stable and dynamic aspects of the immune repertoire in both normal and diseased circumstances. These data provide a high-resolution picture of immune spectra in normal individuals and in patients with hematological malignancies, illuminating, in the latter case, both the initial behavior of clonal tumor populations and the later suppression or reemergence of such populations after treatment.


Antigen receptors with diverse binding activities are the hallmark of B and T cells of the adaptive immune system in jawed vertebrates and are generated by genomic rearrangement of variable (V), diversity (D), and joining (J) gene segments separated by highly variable junction regions (1). Initial calculations of the combinatorial and junctional possibilities that contribute to the human immune receptor repertoire greatly exceed the total number of peripheral T or B cells in an individual (2). One study in which small subsets of rearranged T cell receptor (TCR) subunit genes were extensively sequenced with a few segment-specific primers yielded extrapolations for the full TCR repertoire corresponding to 2.5 × 107 distinct TCRα-TCRβ pairs in the peripheral blood of an individual (3). Extensive repertoire analyses for the human B cell compartment have been more limited, although small-scale studies and focused analysis of immunoglobulin (Ig) class subsets, such as IgE, have been performed (4, 5). Advanced sequencing methods have recently been used to analyze B cell receptor diversity in the relatively simple model immune system in zebrafish (6). Against a background of continually generated novel DNA sequences, expanded clones of B cells with useful antigen specificities persist over time to enable rapid responses to antigens previously detected by the immune system. Systematic means for detection of such expanded clones in human beings would open much of our immunity to specific analysis and tracking, including measurement of clonal population sizes, anatomic distributions, and changes in response to immunological events (7).

In contrast to healthy immune systems, malignancies of B or T cell origin typically express a single dominant clonal Ig or TCR receptor. A variety of assays have been used to detect the presence of B cell clonality for diagnosis of lymphomas and leukemias, including analysis of Ig light chain gene restriction and Southern blotting or sizing of polymerase chain reaction (PCR) products from rearranged Ig or TCR loci (8, 9). Although adequate for many applications, these strategies make limited use of the high information content inherent in rearranged immune receptor gene sequences and can give indeterminate results. A recent study using deep sequencing of clonal IgH (Ig heavy chain) receptor genes in chronic lymphocytic leukemia revealed unexpected intraclonal heterogeneity in a subset of cases, showing that previous approaches have not captured the fundamental features of leukemic cell populations (10). Detection of more subtle clonal populations (for example, to follow the response of lymphomas or leukemias to treatment) now relies on time- and labor-intensive multiparameter flow cytometry or custom-designed patient- and clone-specific real-time PCR assays (1113). Early diagnostic screening approaches may benefit from generalized and more efficient clonal detection. Indeed, a recent population-based epidemiological study showed that small amplified B cell populations can be seen in almost all individuals who go on to develop chronic lymphocytic leukemia, further underscoring the importance of assessing lymphocyte clonality in human specimens (14).

Detection and analysis of clonality is also of fundamental interest in characterizing and tracking both normal and pathogenic immune reactions. For protective and healthy humoral immune responses, high-resolution analysis of immune receptor clonality and evolution offers the potential for definitive detection and monitoring of effective immune responses to vaccination and specific infections (15), whereas for some autoimmune disorders this type of analysis could facilitate diagnosis, long-term therapeutic monitoring strategies, and, eventually, specific interventions (16).

Using a bar-coding strategy to allow pooling of multiple libraries of rearranged IgH V-D-J gene loci from many human blood samples, we have performed high-throughput pyrosequencing to characterize the B cell populations in a series of human clinical specimens (17). Deep sequencing of immune receptor gene populations offers specific and detailed molecular characterization as well as high sensitivity for detecting sequences of interest and should help to transform our understanding of the human immune system while aiding in diagnosis and tracking of lymphoid malignancies.


Bar-coded high-throughput pyrosequencing of rearranged IgH loci

We amplified rearranged IgH loci in human blood samples with BIOMED-2 nucleic acid primers adapted for high-throughput DNA pyrosequencing. A unique 6-, 7-, or 10-nucleotide sequence “bar code” in the primers used for a particular sample allowed pooling and bulk sequencing of many libraries together and subsequent sorting of sequences from each sample (Fig. 1 and table S1). Patient specimens in our initial two replicate experiments included peripheral blood of three healthy individuals, with experimental replicates of one individual’s blood sample at each of two different time points 14 months apart; tissue specimens from patients with lymphomas; and peripheral blood from patients with chronic lymphocytic leukemia. We also studied samples generated by serial 10-fold dilutions of a chronic lymphocytic leukemia peripheral blood specimen into a healthy control peripheral blood sample to assess the sensitivity of the sequencing approach for detecting small numbers of clonal B cells among a background B cell population (Table 1 and table S2). From all specimens pooled for experiment 1, we obtained 299,846 different IgH rearrangement sequences, whereas experiment 2 yielded 207,043 sequences. All sequence reads used for further analysis were full-length IgH amplicons extending from the V gene segment FR2 framework region primer to the J primer region.

Fig. 1

Bar-coded PCR amplicons for multiplexed IgH sequencing. PCR primers used for preparing bar-coded amplicons for high-throughput sequencing were designed with the FR2 IgH V gene segment family primers and the common IgH J segment primer from the BIOMED-2 consortium (19). Additional sequences required for emulsion PCR and pyrosequencing were added (indicated in green) at the 5′ end of the IgH-specific primers. In addition, a 6-, 7-, or 10-nucleotide sequence bar code was designed into the modified IgH J primer to identify the sample from which the PCR amplicons were derived. In the specimens analyzed with the 454 Titanium sequencer, an additional 10-nucleotide sample bar code was incorporated into the multiplexed IgH V gene segment primers used for amplification (table S1). Lines with arrowheads indicate PCR primers. Green segments, primer sequences needed for 454 sequencing protocol; red segments, V gene segment sequence; gray segments, nontemplated N base sequences; yellow segments, D gene segment sequence; blue segments, J gene segment sequence; green ellipse, sample-specific bar code enabling pooling of IgH libraries for multiplexed sequencing. Samples 1 and 2 could represent DNA template from any two clinical specimens or independent DNA template aliquots from the same specimen.

Table 1

Patient specimens for IgH sequencing. The clonality assay results are those obtained with standard PCR amplification and capillary electrophoresis of product amplicons. Blood, peripheral blood mononuclear cells; Lymph node, formalin-fixed, paraffin-embedded lymph node tissue; Liver, formalin-fixed, paraffin-embedded liver tissue; CLL/SLL, chronic lymphocytic leukemia/small lymphocytic lymphoma; FL, follicular lymphoma; PTLD, posttransplant lymphoproliferative disease; DLBCL, diffuse large B cell lymphoma.

View this table:

An overview of the IgH amplicon sequences in the data sets from experiments 1 and 2 is shown in Fig. 2, with each point in the two-dimensional grid for each sample indicating the V gene segment and the J gene segment used by a particular IgH V-D-J rearrangement. The size and color warmth of the circle at each point indicates what proportion of all sequences in the sample had the indicated V and J gene segment usage. Healthy peripheral blood lymphocyte populations showed a diverse use of different V and J gene segments, whereas samples that contained clonal IgH populations corresponding to lymphomas or chronic lymphocytic leukemia specimens were readily identified. Plots of the data showing the V, D, and J segment usage are shown in fig. S1.

Fig. 2

IgH V and J gene segment usage in healthy peripheral blood, oligoclonal or indeterminate specimens, and lymphoid malignancy specimens. Bar-coded IgH rearrangement libraries were PCR amplified from genomic DNA of human specimens, pooled, and characterized by high-throughput pyrosequencing. Experiments 1 and 2 were independent experimental replicates beginning with different aliquots of the template DNA from each specimen. Each wide row represents the IgH sequences identified in a single sample. Samples (S1 to S19) are labeled at the far left. The x axis (across the top of the panels) indicates the V gene segment used in the receptor, and the y axis (the column at the left of the panels) within each wide row represents the J gene segments used. The size and color of the circle at a given point indicates what proportion of all sequences in the sample used that particular combination of V and J gene segments. Sequences in which V, D, or J segments or junctions could not be unambiguously assigned were filtered before generation of these plots. rep, replicate sequence pool PCR amplified from an independent aliquot of template DNA; CLL, chronic lymphocytic leukemia; FL, follicular lymphoma; SLL, small lymphocytic lymphoma; PTLD, posttransplant lymphoproliferative disorder; dil, dilution.

Evaluation of clonal malignancies

Human cancers are clonal proliferations of cells that have sustained mutational damage, leading to dysregulated proliferation, survival, and response to the extracellular environment (18). Molecular clonality testing of IgH receptor and TCR γ loci, accomplished with a PCR and capillary electrophoresis, is a helpful adjunct to morphological and immunophenotypic evaluation of suspected B or T cell malignancies (19). Blood or bone marrow samples from some patients give indeterminate or oligoclonal patterns of reactivity for a variety of reasons: Few lymphocytes may be present, there may be genuine oligoclonal lymphocyte populations, or clonal lymphocytes may have separately detected rearrangements from two chromosomes. We compared the results from DNA sequencing of the products of independent PCR replicates for such samples. One such difficult case is represented by the bone marrow and liver specimens from patient 5 in Table 1. The patient had undergone liver transplantation and subsequently developed a large B cell lymphoma in the liver as a manifestation of posttransplant lymphoproliferative disorder, a condition in which immunosuppression leads to B or T cell lymphomas that are typically associated with Epstein-Barr virus infection (Fig. 2). The patient’s bone marrow showed small lymphoid aggregates that were shown to contain B cells on morphological and immunohistochemical stain evaluation. Capillary electrophoresis sizing of V-D-J rearrangements in the bone marrow sample gave support for a clonal population, but it was unclear whether this population represented involvement of the patient’s bone marrow by the lymphoma seen in the liver. The sequencing data resolved this uncertainty, showing no relation between the liver lymphoma clone associated with IGHV1-8*01-IGHD2-8*01-IGHJ4*02 and the bone marrow B cells. Instead, a separate clonal B cell population that used gene segments IGHV3-15*04-IGHD3-9*01-IGHJ6*02 was present in the bone marrow. Patients with posttransplant lymphoproliferative disorder can develop multiple independent malignant clones, making the extra information provided by sequencing analysis of replicate PCR products particularly helpful. The other V-D-J rearrangements detected in the patient’s bone marrow differed between the two replicate experiments, indicating the presence of small numbers of nonclonal B cells in the specimen. Another diagnostically challenging case, the chronic lymphocytic leukemia of patient 4, showed an oligoclonal pattern by standard PCR and capillary electrophoresis analysis. A consistent pattern was seen with deep sequencing of the sample. Finally, the two distinct V-D-J rearrangements in a lymph node from patient 3 indicated that there were two separate clonal B cell populations in the specimen, a conclusion supported by morphological and immunophenotypic evidence of two different B cell lymphomas (follicular lymphoma and small lymphocytic lymphoma) in the tissue.

Minimal residual disease testing by sequencing

To evaluate the sensitivity of deep sequencing for detection of a clonal lymphoid population in a background of polyclonal cells, we performed serial 10-fold dilutions of a known clonal chronic lymphocytic leukemia blood sample into normal peripheral blood. The percentage of clonal sequences detected at each dilution is shown in Fig. 3 for experiment 2, demonstrating detection down to a 1:10,000 dilution. This represents detection of 0.5 cells per microliter of blood when between 7500 and 14,000 sequences are measured per sample of DNA template derived from ~10 μl of blood.

Fig. 3

Titration of a chronic lymphocytic leukemia clonal sample into healthy peripheral blood. Pooled bar-coded IgH library sequencing was carried out on a series of 10-fold dilutions of a chronic lymphocytic leukemia blood sample (sample 13) into a healthy control blood sample (sample 14) to evaluate the sensitivity and linearity of high-throughput sequencing for detection of a known clonal sequence. The percentage of sequences matching the chronic lymphocytic leukemia clone in each diluted specimen is plotted on a log scale, with zero indicating that no sequences were detected. The counts of clonal sequences in each sample were as follows: CLL sample, 7805 clonal of 8612 total; healthy blood control, 0 clonal of 7518 total; 1:10 dilution, 2095 clonal of 13,717 total; 1:100 dilution, 156 clonal of 8674 total; 1:1000 dilution, 23 clonal of 9471 total; 1:10,000 dilution, 3 clonal of 8895 total; 1:100,000 dilution, 0 clonal of 6940 total. The negative control is the healthy donor blood sample used for diluting the clonal CLL sample. A second experiment measuring fewer sequences from independent PCR amplifications from the same samples detected the following number of clonal sequences in each sample: CLL sample, 422 clonal of 566 total; healthy blood control, 0 clonal of 270 total; 1:10 dilution, 189 clonal of 665 total; 1:100 dilution, 11 clonal of 230 total; 1:1000 dilution, 0 clonal of 344 total; 1:10,000 dilution, 0 clonal of 329 total; 1:100,000 dilution, 0 clonal of 208 total.

We next evaluated clinical specimens from patients with chronic lymphocytic leukemia who had undergone total lymphoid irradiation and antithymocyte globulin therapy followed by human leukocyte antigen–identical allogeneic peripheral blood progenitor cell transplantation (20, 21) and compared the results of deep sequencing analysis to results from patient- and clone-specific real-time PCR assays (Table 2). In these experiments, the patients with chronic lymphocytic leukemia were different from the patients tested in our initial experiments described in Table 1, and the minimal residual disease (MRD) sequencing was performed in a separate instrument run. Real-time PCR assay results were reported as confidently positive if at least 100 copies per microgram of template DNA were detected. Table 2 demonstrates that all specimens showed agreement between the high-throughput sequencing data and real-time PCR assay, although for the lowest confidently positive real-time PCR result for chronic lymphocytic leukemia patient A the clone was detected in only one of the two high-throughput sequencing sample replicates.

Table 2

Comparison of high-throughput sequencing with real-time PCR MRD monitoring assays. For each patient specimen, IgH rearrangements were amplified from 200 ng of genomic DNA of the indicated specimen types with bar-coded primers adapted for 454 pyrosequencing. The IgH rearrangement libraries were pooled and sequenced. The number of clonal sequences (matching the initial diagnostic specimen clone) and the total number of sequences obtained are listed. Data from pyrosequencing were compared to the results of custom quantitative real-time PCR assays designed to amplify the patient’s malignant clonal sequence. The RT-PCR results were considered positive if >100 copies per microgram of template DNA were detected.

View this table:

Peripheral blood B cell repertoire in healthy subjects

To identify potentially expanded B cell clones within healthy peripheral blood, we looked for independent occurrences of “coincident” IgH sequences (identical V, D, and J segments and identical V-D and D-J junction sequences) in independent pools from the same individual. Such coincidences could have resulted from clonally related cells; indeed, clonal relations are likely for a majority of these coincidences, given both the diversity of the potential repertoire of IgH rearrangements and the absence of rearrangements found in this individual from comparable sequence samples from different individuals. We note that any population with a limited IgH rearrangement repertoire would be expected to show large numbers of such coincidences. Instead, we observed only small numbers of coincident sequences in our data. From six independent amplification pools derived from the blood of a single individual at one time point, we observed only 19 potential coincidences from a total of 10,921 distinct IgH rearrangements sequenced. Seven independent amplification pools from a second time point (14 months later) gave comparable results (25 potential coincidences from a total of 7450 distinct rearrangements sequenced) (Table 3).

Table 3

Coincident sequences in a healthy donor’s peripheral blood at two time points. IgH rearrangements from peripheral blood mononuclear cells of a healthy blood donor were PCR amplified in multiple independent replicate PCR reactions and sequenced. The table shows the number of identical sequences detected in more than one replicate (termed coincident sequences). Blood samples from two time points separated by 14 months were analyzed. Sequences from different replicates were considered to be coincident sequences if they shared the same V, D, and J segment usage as well as the same V-D and D-J junctional nucleotide sequences. T1, initial time point; T2, second time point 14 months later; r1 through r7, replicates 1 through 7.

View this table:

It is noteworthy that we see only slightly fewer coincidences when comparing aliquots between the two time points (0.76 coincidences per sample comparison versus 1.22 for comparisons within the same time point). Although the difference is statistically significant (P < 0.05, Fisher’s exact one-tailed test), the modest ratio between intratemporal and intertemporal coincidence levels indicates a considerable degree of persistence in the clonal populations in this individual. The numbers of coincident sequences observed when comparing sequence data from any two aliquots provide strong evidence for substantial diversity in the IgH repertoire. Minimal estimates obtained with approaches similar to the “birthday problem” in probability theory (22) yield a lower bound of ~2 million different IgH rearrangements in these samples. The analysis leading to this lower bound estimate does not yield an upper bound on repertoire; in particular, it is not possible from these data to rule out a category of IgH rearrangements that are very diverse but present in single- or low-copy number in ~2 × 109 B cells in peripheral blood. Thus, the true complexity of the blood IgH repertoire could certainly be much greater than 2 × 106.

In addition to the total complexity of the IgH pool, it is of interest to evaluate the degree to which clonal cell populations above a certain size are present in normal peripheral blood. No sequence was identified in more than 2 of the 13 sequence sets from independent amplicon pools (Table 3). Using a similar analysis to that described above, we can derive an upper bound for the most abundant IgH rearrangements. For the healthy individual examined in these experiments, this analysis yields a maximum contribution to the sequence pool of 1 of 1000 for any individual clone (P < 0.01) in this individual (23).

Within these experimental estimates of the lower bound of the IgH repertoire size, and the upper bound of the largest clone size, a variety of combinations of clonally expanded populations of different sizes could give rise to our observed data. Estimation of the upper limit of the IgH repertoire would require much more extensive sequencing to evaluate the extent of single-copy or very small clonal expansions of B cells and would require characterization of a significant fraction of the blood volume of a healthy donor, which presents ethical concerns. It should be noted that this analysis of the blood does not exclude the possibility that other tissues may contain B cells that are clonally related to circulating cells and does not address the exchange of B cells between the blood and other hematolymphoid compartments of the body. The sequences found in multiple replicates performed with blood from the healthy donor characterized in Table 3 are presented in table S4.

Diversity of clonal B cell expansions in healthy subjects of various ages

We extended our analysis of healthy human patients to an additional 23 subjects ranging in age from 19 to 79 years by sequencing sixfold replicate samples of peripheral blood IgHs from each individual. We detected considerable interindividual variation in the number of expanded lymphocyte clones and expanded clone sizes (Table 4). Using an analysis similar to that performed for the healthy donor in Table 3, we calculated the minimum IgH repertoire size and the largest clone size for these additional subjects. Our data confirm that at least 15 of the 23 additional normal human samples had IgH pools of >1,000,000 different rearrangements. Although the additional eight individuals may have comparable diversity, the lower bound estimates were somewhat lower, relative to the other 15 subjects, because of the greater numbers of weakly amplified clones detected and the lower total yield of sequences from these samples. For a majority of the healthy samples, no sequence appeared in more than two of six sequenced DNA aliquots; for these individuals, this places an upper limit of 0.1% to 0.3% of the measured B cell repertoire that could be dedicated to any single clone, similar to the results from the individual in Table 3.

Table 4

Coincident IgH sequences in peripheral blood of healthy donors of various ages. Peripheral blood samples from 23 healthy donors of ages ranging from 19 to 79 years were analyzed by deep sequencing IgH rearrangements in six replicates from each sample. The number of distinct sequences detected in more than one replicate (termed coincident sequences) from each individual is tabulated below. Sequences from different replicates were considered to be coincident sequences if they shared the same V, D, and J segment usage as well as the same V-D and D-J junctional nucleotide sequences. Calculation of the minimum IgH repertoire diversity in each patient, as indicated by the number of coincident sequences detected, is described in the Supplementary Material.

View this table:

Two of the apparently healthy blood donors in our sample set had expanded B cell clones that were large enough to be detected in all six sequencing replicates. The size of these larger clones can be estimated by the expanded clonal sequence’s proportion of total sequences obtained from these patients: For the 54-year-old patient, this value was 0.15%, whereas for the 68-year-old patient the value was 1.5% of the total sequences.

These data demonstrate that detection of clonal populations that make up >0.1% of the total B cell population is readily possible with the small blood samples used for this work (<0.1 ml of blood was sufficient for the multiple replicates from these specimens). Further, these results suggest that searches for persistent premalignant or pathological clonal populations at the 0.1% level might be facilitated in certain cases by the limited set of amplified candidates in the normal repertoire.

Deep sequencing data sets of this kind should enable explicit detection of preferentially rearranged or selected combinations of V, D, or J segments in IgHs in specific populations. Using the healthy control specimens in our current data sets, we have seen evidence of preferential pairwise segment associations for at least three combinations (D2-2 with J6, D3-22 with J3, and D3-3 with J6) across the group of individuals. Overrepresentation of these D-J combinations (that is, a frequency of the D-J combination that is greater than the products of the D and J frequencies) was observed in 122 of 138, 113 of 138, and 119 of 138 sequenced aliquots, respectively. With a false discovery rate of <10−7 (no examples of overrepresentation in this number of aliquots were found in 107 randomly shuffled data sets), these were the most consistent nonrandom associations seen with the data set (certainly, other associations might emerge from a larger data set). We interpret these results as reflecting nonrandom character in rearrangement or selection in this specific population of individuals (Stanford’s blood donor pool in a fixed time frame). One could certainly expect (and it would be of great interest to follow) different specific nonrandom characters in other populations with distinct histories of community immune response and genetic compositions.


Modern DNA sequencing methods open a new window of investigation into the complex gene rearrangements necessary for human lymphocyte function. Our results using multiplexed bar-coded IgH sequencing of multiple replicate samples of blood from 24 healthy subjects represent the most extensive characterization to date of human B cell populations. For a majority of the healthy individuals, our results were sufficient to place a lower limit of 1,000,000 on the number of distinct IgH rearrangements in circulating lymphocytes and an upper bound of 0.1% to 0.3% of total B cells on the representation of any single clone within the repertoire. A small number of individual amplified clones with greater representation were observed in healthy individuals in our sample set, with the largest clonal populations (seen in patients aged 54 and 68 years) accounting for 0.15% to 1.5% of total sequences of the observed sequence space from circulating B cells. These larger expanded clones may be the result of physiological responses to environmental antigens or pathogens; alternatively, these could represent the precursors to lymphoid malignancies, such as chronic lymphocytic leukemia, which have a strong association with advanced patient age. Recent and older literature describing monoclonal B cell lymphocytosis (MBL) using multiparameter flow cytometry assays to detect B cells with aberrant surface protein expression has indicated that between 5% and 12% of adults have these atypical B cell populations, and essentially all patients who develop chronic lymphocytic leukemia can be shown to have had preceding MBL (14, 24, 25). An important caveat is that most patients who show MBL do not go on to develop chronic lymphocytic leukemia (24, 26). High-throughput immune receptor sequencing provides an unprecedented degree of sensitivity and specificity in tracking monoclonal B cell expansions and enables detection of clonal B cell populations that do not show aberrant cell surface marker expression; it remains to be seen whether this augmented form of tracking will be of use in dissecting the additional clinical and molecular variables that lead some clonal expansions to progress to frank leukemias.

Deep sequencing of IgH rearrangements simplifies the assessment of overt populations of suspected malignant B cells in clinical samples and shows preliminary success in MRD testing after treatment of leukemia patients. A substantial advantage of the MRD detection approach used here is that all patient samples can be analyzed with a single uniform assay rather than having to tailor individual real-time PCR assays to each patient’s clonal malignant sequence and to validate these assays individually as unique clinical tests, an expensive and laborious process likely to limit the accessibility of MRD testing. Having a sequence-based assay that can detect variants from the original malignant clonal sequences present at diagnosis may be an advantage in screening for disease relapse. Recent microarray-based data from studies of acute lymphoblastic leukemias suggest that genomic copy number changes may occur relatively frequently at immune receptor loci between initial diagnostic specimens and relapse specimens (27). For the most sensitive detection of residual disease and clonal variants in a variety of B cell neoplasms, particularly those such as follicular lymphoma that have ongoing hypermutation of rearranged IgH gene loci, it will likely be advisable to use several different primer sets (for example, making use of all three framework regions of the IgH V genes) to avoid false-negative results that arise from mutations at primer-binding sites.

In a broader perspective, the deep sequencing approach to lymphocyte population analysis may provide insights into autoimmune and infectious diseases, medical manipulations of the immune system such as vaccination, and harmful outcomes of current therapies such as graft-versus-host disease after stem cell transplantation. We expect that immune receptor sequencing in medical scenarios that involve lymphoid malignancies or immune-mediated diseases will be broadly useful for gathering diagnostic, prognostic, and disease-monitoring information.

Supplementary Material

Materials and Methods

Fig. S1. V-D-J plots of healthy peripheral blood and lymphoid malignancies.

Fig. S2. Sequence complexity of healthy donor blood specimens.

Table S1. Sequencing primers.

Table S2. Patient specimens used in experiment 2.

Table S3. Number of sequences determined per specimen.

Table S4. Sequences found in more than one replicate from healthy donor blood samples.


References and Notes

  1. The birthday problem refers to the calculation of probability that at least two individuals will share a single birthday in a group of n people. The large number of possible pairwise combinations from a group of n [the number is n*(n − 1)/2] makes this probability surprisingly high even with a value of n that is much fewer than the number of days in a year. This type of calculation is readily expanded to coincidences between groups of individuals, and to “value” spaces other than “days of the year.” Such calculations have been used extensively to evaluate minimum diversity in populations [for example, Z. E. Schnabel, The estimation of the total fish population of a lake. Am. Math. Monthly 45, 349–352 (1938)].
  2. Materials, experimental procedures, and computational methods are described in detail in the Supplementary Material.
  3. Acknowledgments: We thank K. Weinberg, D. Arber, P. Parameswaran, G. Chu, P. Blackshear, G. Marti, A. Lucas, M. Davis, B. Ohgami, R. Levy, S. Levy, S. Feng, C. Niemann, D. Lewis, A. Alizadeh, S. Galli, E. Mignot, M. Fontaine, B. Robinson, M. Han, Y. Natkunam, A. Collins, and members of our laboratories for helpful discussions and C. Krishna for excellent technical support. Funding: Lucile Packard Child Health Research Program for support of research materials; Walter V. and Idun Berry Fellowship program (S.D.B.); National Cancer Institute grant P01CA049605 (J.L.Z.); National Institute of General Medical Sciences grant T32GM07365-34 (E.L.M.); Stanford Vice Provost for Undergraduate Education Research Program (L.N.Z.); and Stanford Immunity, Transplantation, and Infectious Disease Institute (K.D.N. and K.C.N.). Author contributions: S.D.B., A.Z.F., E.L.M., J.D.M., D.B.M., B.S., J.L.Z., and C.D.J. designed the research. S.D.B., E.L.M., J.D.M., L.N.Z., B.S., C.D.J., K.D.N., B.B.S., and B.H. performed the research. A.Z.F., S.D.B., E.L.M., J.M.M., L.N.Z., M.E., B.B.S., B.H., J.L.Z., K.C.N., and D.B.M. contributed analytical tools and reagents. S.D.B., A.Z.F., E.L.M., J.M.M., L.N.Z., B.B.S., B.S., D.B.M., J.L.Z., and C.D.J. analyzed the data. S.D.B., E.L.M., and A.Z.F. wrote the manuscript. All authors read and approved the manuscript. Conflicts of interest: None: S.D.B., E.L.M., J.D.M., J.M.M., L.N.Z., B.S., C.D.J., K.C.N., K.D.N., D.B.M., J.L.Z., A.Z.F. Employees of 454 Life Sciences, A Roche Company: B.B.S., B.H., and M.E. Accession number: The IgH sequences discussed herein can be found as accession number SRP001460.1 in the National Center for Biotechnology Information.
View Abstract

Stay Connected to Science Translational Medicine

Navigate This Article