Research ArticleCancer

ctDNA monitoring using patient-specific sequencing and integration of variant reads

See allHide authors and affiliations

Science Translational Medicine  17 Jun 2020:
Vol. 12, Issue 548, eaaz8084
DOI: 10.1126/scitranslmed.aaz8084

INVARiable progress detecting tumor DNA

The analysis of tumor DNA in a patient’s blood offers a noninvasive way to detect cancer and monitor responses to therapy. The samples are much easier to obtain than a conventional biopsy, and they may be more representative of the variety of mutations found in a given tumor. Unfortunately, the sensitivity of circulating DNA analysis is limited by the amount of tumor DNA in the blood and by the methods of detection. A pipeline for integration of variant reads (INVAR) designed by Wan et al. offers a way to use a patient’s individual tumor sequencing data to monitor for signs of relapse with greater sensitivity.

Abstract

Circulating tumor-derived DNA (ctDNA) can be used to monitor cancer dynamics noninvasively. Detection of ctDNA can be challenging in patients with low-volume or residual disease, where plasma contains very few tumor-derived DNA fragments. We show that sensitivity for ctDNA detection in plasma can be improved by analyzing hundreds to thousands of mutations that are first identified by tumor genotyping. We describe the INtegration of VAriant Reads (INVAR) pipeline, which combines custom error-suppression methods and signal-enrichment approaches based on biological features of ctDNA. With this approach, the detection limit in each sample can be estimated independently based on the number of informative reads sequenced across multiple patient-specific loci. We applied INVAR to custom hybrid-capture sequencing data from 176 plasma samples from 105 patients with melanoma, lung, renal, glioma, and breast cancer across both early and advanced disease. By integrating signal across a median of >105 informative reads, ctDNA was routinely quantified to 1 mutant molecule per 100,000, and in some cases with high tumor mutation burden and/or plasma input material, to parts per million. This resulted in median area under the curve (AUC) values of 0.98 in advanced cancers and 0.80 in early-stage and challenging settings for ctDNA detection. We generalized this method to whole-exome and whole-genome sequencing, showing that INVAR may be applied without requiring personalized sequencing panels so long as a tumor mutation list is available. As tumor sequencing becomes increasingly performed, such methods for personalized cancer monitoring may enhance the sensitivity of cancer liquid biopsies.

INTRODUCTION

Circulating tumor DNA (ctDNA) can be robustly detected in plasma when multiple copies of mutant DNA are present; however, when the amounts of ctDNA are low, analysis of individual mutant loci might produce a negative result because of sampling noise even when using an assay with perfect analytical sensitivity (1). ctDNA may be missed in samples that have low fractional concentrations of ctDNA (relatively few mutant molecules in a high background) or low absolute numbers of mutant molecules because of limited sample input (Fig. 1A). This effect of sampling noise reduces the sensitivity of ctDNA monitoring for patients with early-stage cancers, particularly after surgery (1, 2), and is most pertinent when measuring individual mutations. For example, by assaying for a single mutation per patient in the plasma of patients with early-stage breast or colorectal cancer postoperatively, ctDNA was detected in about 50% of patients who later relapsed (3, 4). When applied to patients with stage II to III melanoma, carrying BRAF or NRAS variants, ctDNA was detected up to 12 weeks after surgery in 16.8% of patients who relapsed within 5 years (5).

Fig. 1 Patient-specific analysis overcomes sampling error in conventional and limited input scenarios.

(A) When high amounts of ctDNA are present, gene panels and hotspot analysis are sufficient to detect ctDNA (top). However, if ctDNA concentrations are low, these assays are at high risk of false-negative results because of sampling noise. Using a large list of patient-specific mutations allows sampling of mutant reads at multiple loci, enabling detection of ctDNA when there are few mutant reads because of either ultralow ctDNA concentrations (middle) or limited starting material or sequencing coverage (bottom). (B) A given sample contains a limited number of haploid copies of the genome, g. For plasma samples, the small amount of material limits the sensitivity that is attainable to one mutant per g total copies. By analyzing in parallel a large number of marker loci (loci that are mutated in the patient’s tumor), n, detection of tumor DNA can be substantially enhanced to detect one or few mutant molecules (indicated in orange) per n * g copies. (C) INtegration of VAriant Reads (INVAR) pipeline. To overcome sampling error, signal was aggregated across hundreds to thousands of mutations. Here, we classify samples (rather than individual mutations) into those that contain ctDNA, or in which ctDNA was not detected. “Informative reads” (IR; shown in blue) are reads generated from a patient’s sample that overlap loci in the same patient’s mutation list. Some of these reads may carry the mutation variants in the loci of interest (shown in orange). Reads from plasma samples of other patients or individuals at the same (nonmatched) loci are used as control data to calculate the rates of background errors (shown in purple) that can occur because of sequencing errors, PCR artifacts, or biological background signal. INVAR incorporates additional information on DNA fragment lengths and tumor allelic fraction of mutations to enhance the accuracy of detection.

Tumor-guided sequencing panels use previous tumor genotype information and custom panel design, and offer the possibility to greatly increase the sensitivity of ctDNA assays for cancer monitoring by targeting a larger number of variants (Fig. 1B) (611). Such assays conventionally target 10 to 20 mutations in plasma (7, 12, 13), although some have analyzed up to 115 patient-specific mutations in parallel, quantifying ctDNA to 1 mutant molecule per 33,333 copies in patients with breast cancer after neoadjuvant therapy (9). Current patient-specific approaches enable identification of relapse between 3 and 10 months earlier than clinical relapse across cancer types including colorectal (10, 14), breast (12), and bladder cancer (11). As whole-genome tumor sequencing becomes increasingly performed in clinical settings (15, 16), cost and time barriers to implementation of patient-specific panels are reduced.

ctDNA detection methods often rely on identification of individual mutations even when data cover multiple loci (7, 9, 17, 18), which may discard mutant signal that does not pass a threshold for calling. The potential sensitivity benefit of targeting hundreds to thousands of tumor markers per patient has been previously suggested (8, 19), although such an approach has only been anecdotally applied to cancer monitoring in plasma (20). In this study, to improve sensitivity, we aggregated sequencing reads across 102 to 104 mutated loci for each patient. We describe the INtegration of VAriant Reads (INVAR) pipeline (Fig. 1C), which leverages custom error-suppression and signal-enrichment methods to enable sensitive monitoring and identification of residual disease. This approach uses previous information from tumor genotyping to guide analysis (Fig. 2A).

Fig. 2 Study outline and rationale for integration of variant reads.

(A) To generate deep sequencing data across large patient-specific mutation lists at high depth, patient-specific mutation lists generated by tumor genotyping were used to design hybrid-capture panels that were applied to DNA extracted from plasma samples. In additional analysis, the tumor genotyping data were used to analyze sequencing data from WES and sWGS by INVAR. (B) Illustration of the range of possible working points for ctDNA analysis using INVAR, plotting the haploid genomes analyzed versus the number of mutations targeted. Diagonal lines indicate multiple ways to generate the same number of informative reads [equivalent to haploid genomes analyzed (hGA) × targeted loci]. Current methods often focus on analysis of ~10 ng of DNA (resulting in sequencing of 300 to 3000 haploid copies of the genome) across 1 to 30 mutations per patient (indicated by the light blue box), which corresponds to ~10,000 informative reads, resulting in frequently encountered detection limits of 0.01 to 0.1% (7, 17). In this study, we focused on analysis of larger numbers (100 to 1000 s) of mutations.

We conceptualize the factors influencing ctDNA detection limits as a two-dimensional space (Fig. 2B), highlighting the importance of maximizing the number of relevant DNA fragments analyzed by increasing either plasma volumes and ctDNA copies analyzed or the number of (patient-specific) variants sampled: The number of informative reads generated is proportional to the product of these two factors. On the basis of these principles, we apply INVAR to analyze patient-specific sequencing data using custom hybrid-capture panels. We further demonstrate the ability to apply INVAR to plasma whole-exome sequencing (WES) and shallow whole-genome sequencing (sWGS).

RESULTS

Patient-specific sequencing panel design

First, tumor tissue genotyping was performed to identify multiple patient-specific mutations per patient: WES data were generated from tumor and buffy coat samples from 47 patients with stage II to IV melanoma (Materials and Methods), identifying a median of 625 mutations per patient [interquartile range (IQR), 411 to 1076; fig. S1 and table S1 in data file S1]. These mutation lists were used to generate custom capture sequencing panels, which were used to sequence longitudinal plasma samples. These included 52 time points from 9 patients with stage IV melanoma (92 total replicates), and 38 samples (after merging sequencing data, see Supplementary Materials and Methods) from 38 patients with stage II-III melanoma (2301× mean raw depth across 130 total libraries). In addition, WES (238× mean raw depth, n = 21) and sWGS (0.6× mean raw depth, n = 33) were performed on subsets of plasma samples from the same patients and used as input for INVAR analysis (tables S2 and S3 in data file S1).

Using a patient-specific sequencing approach, a large number of private mutation loci were targeted. Each locus has its own error rate. Accurate benchmarking of the background noise rates of individual loci to below 10−6 would require analyzing cell-free DNA (cfDNA) molecules from 1 liter of plasma to sample one mutant read (this assumes a cfDNA concentration of 10 ng/ml from plasma, yielding 3 million analyzable molecules). To circumvent this, we sought to develop a background error model for patient-specific sequencing data that could estimate the background error rate of a locus accurately using limited control samples. In this study, 99.8% of the mutations identified by tumor tissue sequencing were private to each individual. Each custom hybrid-capture sequencing panel design in this study covered loci from a median of 5.5 patients, generating data from matched as well as nonmatched mutation lists for each sample. INVAR uses sequencing data from one patient to control for others using both custom and untargeted approaches such as WES or WGS (fig. S2A). There was no significant difference (P > 0.05) in background error rate whether using data from healthy individuals or from other patients as control data (“patient-control” samples, which may control for other patients at private loci) (fig. S2B).

Error suppression in patient-specific sequencing data

As part of the INVAR pipeline (flowchart outline in fig. S3), we developed methods to minimize artefacts in patient-specific sequencing data. We evaluated the contribution of different filtering steps using patient samples, patient-control samples, samples from healthy individuals, and dilution series (Supplementary Materials and Methods). Read collapsing was performed using unique molecular identifiers, which reduced error rates across all mutation classes (fig. S4A), similar to previous studies (21). To increase the resolution of background error rates, we grouped mutations by both mutation class and trinucleotide context, demonstrating over two orders of magnitude difference in background error rate between the least and most noisy trinucleotide contexts (Fig. 3A). Increasing the minimum number of duplicates per read family reduced the error rates further, at the expense of a greater fraction of the sequencing data being discarded (fig. S4B). To balance data loss against background error rate, a minimum family size threshold of 2 was used. With more redundant sequencing, this can be increased to further reduce background rates.

Fig. 3 Characterization of background error rates.

(A) Background error rates after error suppression were calculated for each trinucleotide context by aggregating all nonreference bases across all considered bases (“near-target,” Supplementary Materials and Methods). (B) Reduction of error rates after different error-suppression settings (Wilcoxon test, *P < 0.05, **P < 0.001, and ***P < 0.0001). Each of the filters is described in Supplementary Materials and Methods. (C) Outlier suppression approach: Loci observed with outlying signal relative to the remaining patient-specific loci might be due to noise at that locus, contamination, or a misgenotyped SNP locus (in red). (D) Summary of effects of outlier suppression on each cohort (Wilcoxon test, *P < 0.05). (E) Mutations with higher tumor allele fraction were more frequently observed in plasma (Wilcoxon test, ***P < 0.0001), which was not observed in control samples. NS, not significant; ND, mutation not detected in plasma. (F) Log2 enrichment ratios for mutant fragments from two different cohorts of patients. Size ranges enriched for ctDNA are assigned greater weight by the INVAR pipeline.

INVAR requires any mutation signal to be represented in both the forward (F) and reverse (R) reads of the read pair. This serves to both reduce sequencing error and to produce a small size-selection effect for short fragments because only short fragments would be read completely in both F and R with paired-end 150–base pair (bp) sequencing. This step retained 92.4% of mutant reads and 84.0% of wild-type reads in a training dataset (fig. S4C).

When targeting a large number of patient-specific loci, it becomes increasingly likely that technically noisy sites or single-nucleotide polymorphism (SNP) loci are included in the list. Newman et al. (8) have previously used position-specific polishing to address this issue. In this study, we blacklisted loci that showed either error-suppressed mutant signal in >10% of the patient-control samples or a mean background error rate of >1% mutant allele fraction. This approach excluded 0.5% of the patient-specific loci across all patients (fig. S4D). Requiring mutant signal in both reads and applying a locus noise filter reduced noise modestly when applied individually; however, when combined, they showed a collective benefit, reducing background error rates to below 10−6 in some mutation classes (Fig. 3B). The individual effects of these filters on individual trinucleotide contexts are shown in fig. S4E.

The distribution of observed allele fractions can be assessed when targeting a large number of patient-specific sites. In the residual disease setting, we expect to observe a high degree of sampling error. Therefore, signal should appear stochastically as individual mutant molecules distributed across patient-specific loci, with many of the loci having zero mutant reads. To optimize INVAR for detection of the lowest possible amounts of ctDNA, we identified outliers to this distribution (correcting for multiple testing) and excluded signal at individual loci that was not consistent with the remaining loci (“patient-specific outlier suppression”; Fig. 3C). This reduced mutant signal in control samples about threefold while retaining 96.1% of mutant signal in patient samples (summarized in Fig. 3D; raw data shown in fig. S5A). This filter had the greatest effect at low ctDNA fractions: In the dilution series dataset, 100% of mutations were retained at an integrated mutant allele fraction (IMAF) above 3 × 10−4 and a median of 81% at IMAF in the parts-per-million (ppm) range (fig. S5B). IMAF values were determined by taking a background-subtracted, depth-weighted mean allele fraction across the patient-specific loci in each sample (Supplementary Materials and Methods). Outlier suppression was more likely to remove signal from mutations at highly amplified regions in the genome, such as the BRAF locus in melanoma (fig. S5, C and D). However, in the context of large panels, any individual mutation makes a minor contribution, and despite its amplification, BRAF mutations were not observed in the 10,000× dilution (ppm range) or below because of sampling noise.

Combining the above steps resulted in an average 131-fold decrease in background error relative to raw sequencing data (Fig. 3B) and reduced the error rates of some trinucleotide contexts to below 10−6. This signal-to-noise window created the potential for detection of ctDNA to parts per million in some samples. Individual sample-level average error rates after background error filters are shown in fig. S6.

Patient-specific signal enrichment

INVAR generates a P value at each locus for the presence of ctDNA, which may be weighted with various factors before being combined. Here, we enrich for ctDNA signal by assigning greater weight to loci with higher tumor allele fractions and to sequencing reads that are most similar to the size distribution of ctDNA.

Tumor variants with a higher tumor variant allelic fraction (VAF) are more likely to be observed in the plasma (22); therefore, greater weight was allocated to mutant signals in plasma from loci with high tumor mutant allele fraction. Using a dilution series, we confirmed the relationship between the tumor allele fraction of a locus and the rate of ctDNA detection for that locus in plasma (fig. S7A). We confirmed in clinical samples that patient-specific mutation loci observed in plasma had a significantly higher tumor allele fraction compared to those not observed in plasma (stage II to III melanoma, P < 2 × 10−16; stage IV melanoma, P < 2 × 10−16, Wilcoxon test; Fig. 3E). Similarly, by sequencing a second tumor sample for each patient of the late-stage melanoma cohort, we showed that shared mutations were significantly more frequently detected in plasma than private mutations (P < 2 × 10−16, Wilcoxon test; fig. S7B).

Analysis of DNA fragment sizes in plasma cfDNA libraries from patients with melanoma showed a nucleosomal pattern of cfDNA fragmentation, with mutant fragments shorter than wild-type fragments at the mononucleosome and dinucleosome peaks (fig. S7C). We also observed that patients with stage IV melanoma had a significantly higher median mutant fragment size compared to the patients with stage II to III melanoma (163 bp versus 154 bp; P = 2 × 10−16, Wilcoxon test; fig. S7D) because of a relatively high amount of mutant dinucleosomal DNA in this cohort. This analysis was performed with downsampling to the same number of mutant reads in each dataset, indicating that, in advanced disease, longer dinucleosomal ctDNA fragments can exist and can show greater enrichment for mutant DNA than mononucleosomal DNA. Although polymerase chain reaction (PCR)–based studies have noted greater cfDNA fragment length in cancer patients (2325), previous studies have focused on shorter ctDNA compared to nontumor cfDNA (2630). To address these inconsistencies, INVAR weights mutant reads based on the empirical distribution of mutant fragments in all other samples in the cohort being studied, so any size range that may be enriched in cancer may be given greater weight. The potential advantage of assigning greater weight to specific loci rather than applying a hard size-selection cutoff is that when ctDNA fractions are low, a “hard” cutoff can cause loss of rare mutant alleles (31). To perform size weighting, we assessed the frequency of mutations for any given fragment size (Fig. 3F and fig. S7E) and then weighted each mutant read observed with the probability that it came from the cancer distribution as opposed to the wild-type size distribution (Supplementary Materials and Methods).

After signal weighting, INVAR aggregates signal across all patient-specific mutation loci (Supplementary Materials and Methods). To determine whether ctDNA is detected in a sample, data from samples of other patients, where these mutations were not present in the tumor, were used as negative controls to set the detection threshold (fig. S2A). The aggregated INVAR likelihood ratio was used to determine detection, rather than a preset minimum number of mutant molecules. A threshold for determining detection was defined so as to maximize sensitivity and specificity (Supplementary Materials and Methods), requiring a minimum specificity of 90%. In the setting we have used, two mutant reads supporting the same locus, one forward and one reverse, were required to pass the background noise filters. In some samples, two such reads that may come from the same molecule were sufficient to obtain positive ctDNA detection at high specificity.

Analytical sensitivity and specificity of INVAR

To benchmark the sensitivity of INVAR, we performed custom capture sequencing of a dilution series of plasma from one patient with melanoma (stage IV disease), for whom we identified 5073 mutations through WES. Plasma DNA from this patient was serially diluted into healthy control volunteers’ plasma DNA to an expected IMAF of 3.6 × 10−7. Without use of unique molecular barcodes, INVAR detected ctDNA down to an expected allele fraction of 3.6 × 10−5, which was quantified to an average IMAF of 4.7 × 10−5 in both replicates (fig. S8A). After the use of molecular barcodes and custom error-suppression methods, the diluted ctDNA was detected to an expected IMAF of 3.6 × 10−6 (3.6 ppm) in both replicates, with IMAF values of 4.3 and 5.2 ppm (Fig. 4, A and B, and fig. S8B). Tumor-genotyped mutations (5305 of 8660, 61.2%) were observed in plasma in the dilution series at any point. The correlation between IMAF and the expected VAF was 0.98 (Pearson’s r, P < 2.2 × 10−16). At an expected allele fraction of 3.6 × 10−7, ctDNA was detected in two of three replicates. To assess the impact of the number of targeted mutations on sensitivity, we downsampled sequencing data in silico to include subsets of patient-specific mutation lists. This confirmed that targeting more mutations resulted in more informative reads and correspondingly higher ctDNA detection rates (Fig. 4C).

Fig. 4 Sensitivity and specificity determination for INVAR.

(A) Spike-in dilution experiment to assess the sensitivity of INVAR. The median number of mutation loci with error-suppressed positive signal was 3637 of 8660 for the undiluted sample replicates and decreased for subsequent 10× dilutions to 2586 loci, 209 loci, 73.5 loci, 27.5 loci, and finally 16 loci for 100,000× diluted sample replicates. The heat map shows the 5305 loci that were observed in plasma (at any dilution); a magnified version is provided in fig. S8B. (B) After error suppression, ctDNA was detected in all replicates for all dilutions to an estimated concentration of 3.6 ppm. Using signal enhancement based on fragment sizes, ctDNA was detected in two of three replicates at an estimated ctDNA allele fraction of 3.6 × 10−7 (Supplementary Materials and Methods). Using error-suppressed data of 11 replicates from the same healthy individuals without spiked-in DNA from the cancer patient, no mutant reads were observed in an aggregated 6.3 × 106 informative reads across the patient-specific mutation list. (C) Sensitivity in the spike-in dilution series was assessed after the number of loci analyzed was downsampled in silico to between 1 and 5000 mutations (Supplementary Materials and Methods). (D) ROC curve for the stage IV melanoma cohort. This was generated on the basis of 92 replicates from 52 time points from 9 patients (black, with 212 negative controls obtained from nonmatched loci) and when comparing the patient samples to sequencing data from 7 healthy individuals (red) that generated 45 ctDNA-negative control datasets (samples tested across the mutation list for each of the patients; see fig. S2).

The false-positive rate of INVAR was measured twice, once in patient-control samples and separately in healthy control samples. First, detection accuracy was evaluated through analysis of samples from other patients (patient-control samples) at nonmatched mutation loci (Fig. 4D, black line). This resulted in a specificity of 95% (table S4 in data file S1). To confirm the specificity in independent control samples, we ran custom capture sequencing (with the same oligo pools) on samples from healthy individuals and analyzed those by INVAR using each of the patient-specific mutation lists. This resulted in a specificity of 97% (Fig. 4D, red line).

Distribution of ctDNA mutant allele fractions in plasma samples from patients with melanoma

We applied INVAR to custom capture panel sequencing data from 47 patients with stage II to IV melanoma. For 9 patients with stage IV disease, 52 samples were taken at baseline and during treatment with chemotherapy; for 38 patients at high risk of recurrence after complete resection of stage II to III cutaneous melanoma (32), samples were taken between 3 and 6 months after surgery with curative intent (Materials and Methods). This approach generated up to 2.9 × 106 informative reads per sample (median 1.7 × 105 informative reads), thus analyzing orders of magnitude more cfDNA fragments compared to methods that analyze individual or few loci (Fig. 5A). In this study, we demonstrated a dynamic range of five orders of magnitude and detection of trace amounts of ctDNA in plasma samples (Fig. 5, B and C); this detection was obtained from a median input material of 1638 copies of the genome (5.46 ng of DNA; table S2 in data file S1). In a total of 13 of the 130 plasma sample replicates analyzed with custom capture sequencing, ctDNA was detected with signal in fewer than 1% of the patient-specific loci (Fig. 5D). The lowest fraction of a cancer genome detected was 1/683, equivalent to 5 fg of tumor DNA, with an IMAF of 2.52 × 10−6 and an INVAR likelihood ratio at the 99th percentile of bootstrapped values from healthy control samples (fig. S9A). This was detected on the basis of an individual mutant molecule sequenced in both forward and reverse directions in a total of 7.7 × 105 informative reads. Given the limited input amounts, in 48% of the cases (indicated with filled circles in Fig. 5C), the low ctDNA amounts that we detected would be below the 95% limit of detection (LOD) for a “perfect” single-locus assay. The input mass versus IMAF of each sample is shown in fig. S9B, highlighting the sensitivity benefit of a sequencing approach using integration of variants across the genome. Thus, targeting multiple mutations can allow detection of low absolute amounts of tumor-derived DNA.

Fig. 5 ctDNA quantification by INVAR in early and advanced melanoma.

(A) Number of haploid genomes analyzed (average depth of unique reads) and number of mutations targeted in plasma samples from two cohorts of patients with melanoma. Each replicate is shown separately. Dashed diagonal lines indicate the number of informative reads that were generated. The shaded red and blue boxes correspond to those shown in Fig. 2B. (B) Two-dimensional representation of ctDNA fractions detected (IMAF), plotted against the number of informative reads for each sample replicate. The dashed line indicates the theoretical limit of detection (LOD) set by the reciprocal of the number of informative reads. In some samples, >106 informative reads were obtained, and ctDNA was detected down to fractional concentrations of few ppm (orange shaded region). For panels C and D, fig. S14B and fig. S14C, we used a threshold of 20,000 informative reads (left-most dotted line) such that samples with undetected ctDNA that had fewer than 20,000 informative reads were deemed unclassified because of limited sensitivity and were excluded from the analysis (yellow shaded region below 20,000 informative reads). The number of patient-specific mutations targeted in each sample is indicated by the size of the circle. (C) ctDNA fractions (mean IMAFs across replicates) detected in cell-free DNA from plasma samples of patients with melanoma in this study, shown in ascending order for each of the two cohorts. Filled circles indicate samples where the number of haploid genomes analyzed would fall below the 95% LOD for a perfect single-locus assay given the estimated number of cancer genome copies [see (D)]. In patients with early-stage melanoma, 12 of 24 patients with undetected ctDNA after surgery relapsed during 5-year follow-up, compared to 8 of 11 patients with ctDNA detected after surgery (fig. S14B). (D) Number of copies of the cancer genome detected (averaged across replicates) for each of the samples in the same order as above in (C), calculated as the number of mutant fragments divided by the number of loci queried (table S2 in data file S1).

Distributions of ctDNA mutant allele fractions across cancer types and stages

To study the distributions of ctDNA amounts more broadly, we applied INVAR to samples from different clinical studies covering a range of tumor types, including non–small cell lung cancer (NSCLC; n = 19 patients), renal cancer (n = 24 patients) (33), glioblastoma (n = 8 patients) (34), and breast cancer (n = 7 patients). The median detected IMAF per cohort varied from 5.2 ppm in early-stage breast cancer to 15000 ppm (0.015) in advanced melanoma (Fig. 6A). ctDNA was detected in ≥50% of patients with glioma (34), stage I to III NSCLC, and stage I to II breast cancer.

Fig. 6 ctDNA detection and fractional concentrations in different clinical studies across cancer types with early and advanced disease.

(A) ctDNA fractions (IMAFs) are shown for samples from different clinical studies (33, 34) covering a variety of cancer types and varying disease stages and treatment status. The specificity threshold was determined for each cohort using ROC analysis (B) with a minimum specificity of 90%. Box plots, and data in black dots, show samples detected at the indicated specificity. Five samples with ctDNA signal below this threshold, but with likelihood ratios equivalent to specificity >85%, are shown in gray. (B) ROC curves and area under the curve (AUC) values for the likelihood ratios obtained for each of the cohorts. The numbers of patients used for each ROC analysis are shown in table S4 in data file S1.

For each sample where ctDNA was not detected, we estimated the 95% upper bound for ctDNA fraction based on the total number of informative reads tested in that sample. Forty-three percent of detected samples (49 of 115) and 76% of nondetected samples (44 of 58) had ctDNA fractions or 95% upper bounds below one mutant per 10,000 molecules, or IMAF < 1 × 10−4 (fig. S10), further highlighting the requirement of sequencing a large number of informative reads for sensitive quantification of ctDNA.

For each cohort, the likelihood ratio threshold for detection was determined by receiver operating characteristic (ROC) analysis (Supplementary Materials and Methods) with a minimum specificity of 90% (Fig. 6B). A median specificity of 95.0% was obtained (table S4 in data file S1). Specificity varied between cohorts, likely because of differences in noise profile between cancer types during the panel design phase. Some samples showed signal but a likelihood ratio below the threshold for 90% specificity. These samples (shown in gray in Fig. 6A) had likelihood ratios that would be detected at a specificity of 85%, suggesting that detection can be improved with further optimization. We obtained median values for area under the curve (AUC; Fig. 6B) of 0.98 in advanced cancers (stage IV melanoma and breast cancers; AUC range, 0.96 to 1) and 0.80 in early-stage disease and other settings where ctDNA detection has previously been challenging (including stage I to III NSCLC, stage I to II breast cancers, renal and brain tumors, and stage II to III melanoma after surgery; AUC range, 0.64 to 0.92) (5, 33, 34).

Personalized monitoring of ctDNA in melanoma

INVAR analysis was used to monitor ctDNA dynamics in response to treatment in a cohort of patients, most of whom received anti-BRAF targeted therapy as first-line treatment (Fig. 7A). ctDNA IMAF values showed a correlation of 0.8 with tumor size assessed by computed tomography (CT) imaging (Pearson’s r, P = 6.7 × 10−10; fig. S11A and table S5 in data file S1), comparable to other studies (7, 13). ctDNA IMAF had a correlation of 0.53 (Pearson’s r, P = 2.8 × 10−4; fig. S11B) with serum lactate dehydrogenase (LDH), a routinely used clinical marker for monitoring of melanoma. A large proportion of time points in our dataset had LDH concentrations within the normal range (0 to 250 IU/liter), potentially masking dynamic changes in disease state, whereas ctDNA concentrations had a wider dynamic range, which may explain the relatively low correlation observed (Fig. 7A). Similar observations have been made in comparison to clinical markers of other cancer types (35, 36).

Fig. 7 Longitudinal ctDNA monitoring.

(A) ctDNA and lactate dehydrogenase (LDH) are shown over time for each of the patients with stage IV melanoma. Treatments are indicated by shaded boxes. The upper limit of normal of LDH (250 nM) and the LOD of ctDNA are each indicated with a dashed horizontal line. IMAF, integrated mutant allele fraction. (B) ctDNA concentrations over time, grouped by mutation cluster. Mutations were clustered on the basis of longitudinal mutation dynamics (Supplementary Materials and Methods). Treatments are indicated by shaded boxes. The transparency of each line indicates the median tumor allele fraction of the respective mutation clusters in plasma, indicating mutation clusters that are more likely to be clonal in origin.

In one patient (#59) treated with a series of targeted therapies and immunotherapy, ctDNA was detected to an IMAF of 2.5 ppm, corresponding to a time point where three of five tumor lesions had become nondetected by CT, and the remaining two had volumes of 0.59 and 0.43 cm3 (fig. S12A and table S3 in data file S1), indicating that ctDNA may be detected at the threshold of CT detection (37). After progression on vemurafenib, patient #59 progressed on multiple other targeted therapies (pazopanib, dabrafenib, and trametinib) and immunotherapy (ipilimumab), corresponding to a constant rise in ctDNA over 2 years of monitoring (Fig. 7A and fig. S12A). By clustering mutation trajectories over time (Supplementary Materials and Methods), we identified clusters of mutations that emerged after progression on vemurafenib and pazopanib (Fig. 7B). The mutations in the cluster that emerged latest in plasma had the lowest median allele fraction in the tumor specimen collected from this patient before treatment, at 26%, in contrast to 33 and 37% for the clusters that were observed in plasma earlier. Other patients’ IMAFs and tumor volumes for patients with stage IV melanoma are shown in fig. S12 (B to E).

In some patients, we identified differential responses of mutation clusters to targeted therapy (Fig. 7B, all mutations shown in fig. S13), suggestive of a heterogeneous response to targeted therapy in different tumor subclones. In one case (patient #60), multiple mutation clusters were not detected at the start of treatment but emerged after 4 months’ treatment with vemurafenib, whereas a separate cluster that was present at the beginning declined in IMAF over a year on treatment. In two cases (patients #60 and #64), treatment-responsive mutation clusters had a lower average tumor allele fraction compared to the nonresponsive mutation clusters. These data highlight the increased granularity of insight into clonal evolution that may be obtained by sequencing a larger number of mutations over time (20). However, by virtue of being patient specific, this approach is not designed to identify de novo mutations in plasma that were not present in the original tumor, although mutation calling might also, in principle, be applied to the data generated.

To test INVAR in the residual disease setting, we applied it to samples from 38 patients at high risk of recurrence after complete resection of stage II to III melanoma, recruited in the UK AVAST-M trial (32); samples were collected 3 to 6 months after surgery with curative intent. The clinical details of this cohort are given in fig. S14A. We interrogated a median of 3.6 × 105 informative reads (IQR, 0.64 × 105 to 4.03 × 105) and detected ctDNA to a minimum IMAF of 2.85 ppm, as indicated in Fig. 5C. The specificity of ctDNA detection in this analysis was >0.98 (table S4 in data file S1). Here, to assess performance at high sensitivity, samples that were not detected with fewer than 20,000 informative reads were termed “unclassified” because insufficient information was available to classify them as ctDNA negative. Similar to the relative haplotype dosage method by Lo et al. (38), additional information is warranted to classify these samples; thus, three samples were excluded from the analysis on this basis.

Of the evaluable patients, ctDNA was detected in 40% (8 of 20) of patients who later recurred and did not show a significant difference in disease-free interval (6.3 months versus median not reached with 5 years’ follow-up; hazard ratio, 2.08; 95% confidence interval, 0.85 to 5.13; P = 0.08; fig. S14B) and overall survival (2.6 years versus median not reached, P = 0.11; fig. S14C). There were no significant associations between Breslow thickness of the primary tumor and ctDNA detection, primary tumor ulceration, or disease stage (P > 0.05, Fisher’s exact test; fig. S14D). The median LDH value showed no significant difference (403 IU/liter versus 327 IU/liter; P > 0.05, Wilcoxon test). A previous analysis of ctDNA detection at 12 weeks after surgery in 161 patients with resected BRAF- or NRAS-mutant melanoma from the same clinical trial detected ctDNA in 16.8% of patients who later relapsed using digital PCR (5). Given the small size and limited power of this analysis, validation studies are required to fully benchmark this approach in larger cohorts and in other cancer types after surgery.

Personalized ctDNA monitoring using WES and sWGS

Patient-specific capture panels allow highly sensitive detection of ctDNA, but they require previous design of customized sequencing panels. Therefore, we assessed whether INVAR could be applied to standardized workflows such as WES or WGS. This would generally result in a smaller number of informative reads covering mutated loci because of wider coverage and lower depth of sequencing, in exchange for omitting the panel design step and requiring only the patient-specific mutation list from tumor tissue sequencing (for example, WES). The tumor tissue sequencing may thus be performed in parallel with plasma sequencing to reduce the turnaround time (Fig. 8A).

Fig. 8 Detection of ctDNA from WES/WGS data using INVAR.

(A) Schematic overview of a generalized INVAR approach. Tumor (and buffy coat) and plasma samples can be sequenced in parallel using whole-exome or genome sequencing, and INVAR can be applied to the plasma WES/WGS data using mutation lists inferred from the tumor (and buffy coat) sequencing. (B) INVAR was applied to WES data from 21 plasma samples with an average sequencing depth of 238× (before read collapsing) and to WGS data from 33 plasma samples with an average sequencing depth of 0.6× (before read collapsing). ctDNA fractions (IMAF values) are plotted versus the number of unique informative reads for every sample. The dotted vertical line indicates the 20,000 informative reads (IR) threshold, and the dashed diagonal line indicates 1/IR. (C) IMAFs observed for the 21 samples analyzed with WES ordered from low to high. ND, not detected. (D) ctDNA fractions obtained from plasma WES (yellow) and sWGS (black) were compared to the ctDNA fraction obtained from custom capture sequencing of the same samples, showing correlations of 0.97 and 0.93, respectively (Pearson’s r, P = 1.5 × 10−13 and P = 9 × 10−10). (E) Number of haploid genomes analyzed (indicating depth of unique coverage after read collapsing) and number of known tumor mutations covered by plasma WES and sWGS. sWGS data could cover many more mutation loci, but in this case, mutations were determined from WES analysis of tumors; therefore, the lists of mutations are of similar size. (F) Longitudinal monitoring of ctDNA fractions in plasma of six patients with stage IV melanoma using sWGS data with an average depth of 0.6×, analyzed using INVAR with patient-specific mutation lists (for patients with >500 mutations identified by WES tumor profiling). Filled circles indicate detection at a specificity of >0.99 by ROC analysis of the INVAR likelihood (fig. S16). For samples with no ctDNA detection, the 95% confidence intervals of the maximum IMAF are shown, based on the number of informative reads for each sample (empty circles and bars). (G) Predicted sensitivities for sWGS plasma analysis of patients with different cancer types, using an average of 0.1× or 10× coverage (equivalent to 0.1 and 10 haploid genomes analyzed) and the known mutation rates per Mbp of the genome for different cancer types (41). The LOD for ctDNA based on copy number alterations is shown at 3% (39), and an approximate LOD of INVAR without error suppression [or with family size 1 (fs1)] is indicated based on error rates in fig. S15C.

To test the generalizability of INVAR, we selected samples with IMAF values quantified as being between 4.5 × 10−5 and 0.16 using custom capture sequencing and used commercially available exome capture kits to sequence plasma DNA to a median raw depth of 238×. Despite the modest depth of sequencing, we obtained between 1565 and 473,300 informative reads using WES (Fig. 8B). Background error rates per sample are shown in fig. S15A. We detected ctDNA in all tested samples (n = 21) down to IMAFs as low as 4.34 × 10−5 (Fig. 8C) with a specificity of >95% (fig. S15B), demonstrating that ctDNA can be sensitively detected by INVAR from WES data using patient-specific mutation lists. These IMAF values showed a correlation of 0.97 with custom capture data from the same samples (Pearson’s r, P = 1.5 × 10−13; Fig. 8D). Therefore, INVAR is highly sensitive not only when applied to custom capture panels that redundantly sequence up to 102 to 103 haploid genomes but also when applied to WES data with a de-duplicated coverage between 10× and 100× (Fig. 8E).

We hypothesized that ctDNA could be detected and quantified with INVAR from even smaller amounts of input data. Therefore, we performed WGS on libraries from cfDNA of longitudinal plasma samples from a subset of six patients with stage IV melanoma to a mean depth of 0.6× (indicated in black in Fig. 8B). For each of those patients, we identified >500 patient-specific mutations using WES from each patient’s tumor and buffy coat DNA. We generated between 226 and 7696 informative reads per sample (median, 861; IQR, 471 to 1559; Fig. 8B) after read collapsing with a “minimum family size” requirement of 1 (duplicate removal). Despite not leveraging unique molecular barcodes, the median error rate per sample was 5 × 10−5 (fig. S15C). Using INVAR on sWGS data, IMAF values as low as 1.1 × 10−3 were quantified (Fig. 8F) with specificity of >97% (fig. S15D). Compared to custom capture data from the same samples, we observed a correlation of 0.93 (Pearson’s r, P = 9 × 10−10; Fig. 8D). In samples where ctDNA was not detected, it was possible to estimate the maximum likely IMAF of that sample from the known number of informative reads for each sample (Fig. 8F). Using sequencing data with less than 1× coverage, INVAR could boost the sensitivity using patient-specific mutation lists by up to an order of magnitude compared to copy number analyses (28, 39).

Future sensitivity estimates of personalized sequencing

Last, we sought to estimate the sensitivity of INVAR with each data type used in this study. sWGS requires minimal sequencing in plasma and thus may be performed with minimal DNA input, for example, from droplets of blood (40). In contrast, a single mutation assay or fixed panel of limited size could not achieve this degree of sensitivity from limited material. As sequencing costs decline and as tumor WGS becomes increasingly performed clinically (16), future increases in sequencing depth in WGS in plasma could boost sensitivity further without requiring the design of patient-specific capture panels (Fig. 8G), particularly in tumor types with high mutation rates [as identified in (41)]. However, sWGS data would lack redundancy of sequencing, which would limit the extent of error suppression.

For custom hybrid-capture sequencing, the distribution of informative reads for all the samples in this study is shown in fig. S16, indicating those where limited sensitivity was reached (<20,000 informative reads) and, conversely, those with >106 reads and sensitivity to the ppm range. Samples with limited sensitivity could be reanalyzed with larger amounts of DNA input/more sequencing or by using tumor WGS to expand the patient-specific mutation lists. Our considerations suggest that the sensitivity of ctDNA quantification using sequence alterations alone would likely be limited to ~0.1 ppm because of limitations placed by tumor mutation burden and plasma cfDNA amounts (Fig. 2B), even before considering background error rates.

DISCUSSION

In this study, we developed a method for sensitive patient-specific monitoring of ctDNA that leverages the properties of patient-specific sequencing data. In this initial application of INVAR, sensitivity was routinely achieved to one mutant molecule per 100,000, leveraging a median number of 1.7 × 105 informative reads generated per sample. Under optimal conditions, when the number of tumor-guided mutations and input material are sufficiently high, it is possible to achieve detection to individual ppm, representing an order of magnitude greater sensitivity as compared to alternative methods (8, 9, 18). In advanced disease, INVAR enabled detection of ctDNA in 100% of patients with stage IV breast cancer or melanoma and 75% of patients with glioblastoma. In earlier-stage disease, ctDNA was detected in plasma in 63% of patients with lung cancer before treatment, in 29% of patients with early-stage melanoma after surgical resection (40% of those who later relapsed), and in longitudinal samples from three of four patients with early-stage breast cancer.

We applied INVAR to exome sequencing and WGS data, demonstrating that personalized, tumor-guided analysis can be beneficial when applied to nonpersonalized sequencing data. Although these latter methods generated fewer informative reads, INVAR detected ctDNA to 5 parts per 100,000 using WES and to 0.1% mutant allele fraction using low depth (0.6×) WGS, over an order of magnitude more sensitive compared to previous methods based on copy number analysis of sWGS (39, 42). For a given sequencing output, data generated by WES or WGS have fewer informative reads, and the sensitivity is therefore less than that obtained by personalized sequencing panels. However, the turnaround time would be faster because of the omission of a panel design stage, and the cost saving of not designing a sequencing panel may, in the future, offset the cost of additional sequencing.

When applied to samples from patients with early-stage melanoma, our results with INVAR recapitulated those from Lee et al. (5), which analyzed samples from this same cohort of patients with melanoma at a high risk of relapse: Both studies found that of patients with residual ctDNA in this setting, at 5 years, the disease-free proportion was 20 to 25%, and the proportion surviving was ~40%. This reproducible finding, using different methods with high specificity, suggests that these findings are less likely to be a technical error; rather, this may indicate possible residual cells or signal present after surgery (43), which is not linked to disease relapse in the tested time frame. Similar observations have recently been reported for patients at high risk of relapse in other cancers: In patients with stage III colorectal cancer, nearly half of patients with residual ctDNA after surgery did not relapse within 3 years (44, 45). This is in contrast to results for patients with earlier-stage colorectal cancer, where analysis using the same method showed near 100% rate of relapse within 1 to 2 years when residual signal was detected in ctDNA (4, 14, 46). This observation, now repeated in several cohorts and with different methods, merits further investigation.

This study has notable limitations. First, INVAR and similar personalized approaches operate by targeted sequencing and/or evaluation of signals across a patient-specific list of mutations that is externally provided, and thus are not suitable for early detection or diagnosis of new cancers. In this case, tumor samples were used to identify mutations, but this can, in principle, be achieved by analysis of plasma at time points where there is high ctDNA content (47). Next, INVAR has so far been applied on a limited number of cases, which may have contributed to limited power for detection of residual disease in early-stage melanoma. The performance of this approach was assessed using ROC analysis across individual cohorts, although when applied at scale, it would be desirable to set a fixed specificity threshold. Evaluation of this method in larger datasets would enable optimization of both laboratory and computational methods. Furthermore, the IMAF is calculated as an average across all informative reads that span the list of mutations, and so intratumor heterogeneity could bias the list of mutations, resulting in an artificially low IMAF. For example, in our dilution series, 3355 of the 8660 mutations analyzed were not detected in plasma. If these loci were removed from analysis, the effective number of informative reads would be reduced by 39% and the estimated IMAF would correspondingly increase. For longitudinal quantification within a patient, this would not alter the relative dynamics; however, it suggests a possible uncertainty range in estimating absolute ctDNA fractions using this method.

Personalized sequencing of a large number of mutations may be carried out within clinically relevant time frames: Recent personalized amplicon sequencing approaches may generate tumor exome sequencing within 2 weeks, with a further 2 weeks for panel design and sequencing (48). Furthermore, recent advances in oligo synthesis enable rapid manufacture of custom bait sets in comparable time frames to custom primer pairs used in amplicon sequencing (49), so custom capture panels could match turnaround times of custom amplicon-based approaches. Tumor sequencing and bait design incur a one-time cost for every patient. The cost of custom hybrid-capture sequencing could occupy a price point between amplicon sequencing of smaller panels and sequencing of large panels such as exomes. Overall costs may be mitigated by trends for increasing utilization of tumor sequencing in oncology, which could remove those extra costs, and strategies such as used here where baits are pooled and generated once for a set of patients so that samples from one patient can generate control data for other patients.

In summary, patient-specific mutation lists provide an opportunity for highly sensitive monitoring from a range of sequencing data types using methods for signal aggregation, weighting, and error suppression. As tumor tissue sequencing becomes increasingly routine in personalized oncology, patient-specific mutation lists may be increasingly leveraged for individualized monitoring from a variety of sequencing data types for sensitive monitoring.

MATERIALS AND METHODS

Study design

One hundred seventy-six plasma samples from 105 patients with multiple cancer types were collected along with plasma from 45 healthy controls. For each patient, at least one tumor biopsy and matched germline sample was required for tumor genotyping. For patients with cutaneous melanoma, samples were collected from patients with American Joint Committee on Cancer (AJCC) stage II to IV disease enrolled on the MelResist (REC 11/NE/0312, also abbreviated to MELR) and AVAST-M studies (REC 07/Q1606/15, ISRCTN81261306) (table S6 in data file S1) (32, 50). MelResist is a translational study of response and resistance mechanisms to systemic therapies of melanoma, including BRAF targeted therapy and immunotherapy, in patients with stage IV melanoma. AVAST-M is a randomized controlled trial that assessed the efficacy of bevacizumab in patients with stage IIB to III melanoma at high risk of recurrence after complete resection; only patients from the observation arm were selected for this analysis, and samples were collected 3 to 6 months after surgery. The Cambridge Cancer Trials Unit—Cancer Theme coordinated both studies, and demographics and clinical outcomes were collected prospectively. The BLING study (biopsies of liquids in new gliomas, REC 15/EE/0094) analyzes patients with brain tumors, recruited at Addenbrooke’s Hospital, Cambridge, UK (34). Patients with a range of renal malignancies were recruited to the discovery and analysis of novel biomarkers in urological diseases study (DIAMOND, REC 03/018) (33). The Personalised Breast Cancer Programme (REC 16/EE/0100) recruited patients with early- and advanced-stage breast cancer. Patients with AJCC stage I to IIIB NSCLC were recruited to the LUng cancer - CIrculating Tumor DNA study (LUCID, REC 14/WM/1072). Consent to enter each study was obtained by a research/specialist nurse or clinician who was fully trained regarding the research. A sample size calculation was not performed for this proof-of-principle study. Analysis of the AVAST-M and LUCID cohorts was performed blinded to patient outcome and baseline characteristics. Laboratory and computational methods are described in detail in Supplementary Materials and Methods.

SUPPLEMENTARY MATERIALS

stm.sciencemag.org/cgi/content/full/12/548/eaaz8084/DC1

Materials and Methods

Fig. S1. Tumor mutation list characterization for INVAR.

Fig. S2. Use of patients and healthy individuals as controls in pooled sequencing panels.

Fig. S3. INVAR flowchart.

Fig. S4. Characterization of background error rates with bespoke error-suppression methods.

Fig. S5. Patient-specific outlier suppression.

Fig. S6. Background error rate per sample in personalized capture data.

Fig. S7. Signal weighting based on tumor allele fraction and fragment size, with consideration of intratumor heterogeneity.

Fig. S8. ctDNA dilution series with and without read collapsing.

Fig. S9. Comparison of input mass and IMAF observed and ROC curves for the stage IV melanoma cohort.

Fig. S10. Distribution of ctDNA fractions in plasma samples.

Fig. S11. Clinical correlates in advanced melanoma.

Fig. S12. Longitudinal tumor imaging and ctDNA data in advanced melanoma.

Fig. S13. Longitudinal individual plasma mutation data in advanced melanoma.

Fig. S14. Baseline characteristics and clinical correlates of ctDNA detection in early-stage melanoma.

Fig. S15. Background error rates and ROC curves in exome and sWGS data.

Fig. S16. Informative reads generated and hypothetical sensitivity with increased numbers of informative reads.

Data file S1 contains the following tables:

Table S1. Patient-specific mutation lists for melanoma cohorts.

Table S2. Sample library preparation input, QC, and INVAR likelihood ratios—patient plasma samples.

Table S3. Sample library preparation input, QC, and INVAR likelihood ratios—control plasma samples from healthy individuals.

Table S4. Likelihood ratio thresholds and specificity values for each cohort.

Table S5. Tumor volumes for the stage IV melanoma cohort.

Table S6. Patient demographics for the melanoma cohorts.

References (5160)

REFERENCES AND NOTES

Acknowledgments: We would like to thank the patients and their families. We also thank C. Thorbinson, A. Azevedo, N. Maroo, and G. R. Bignell from the MelResist and AVAST-M study groups and the Cambridge Cancer Trials Unit - Cancer Theme, Addenbrooke’s Hospital. We are grateful to V. Gnanapragasam, chief investigator of the DIAMOND study (REC 03/018), for access to clinical samples from patients with renal cancer. Funding: We would like to acknowledge the support of The University of Cambridge, Cancer Research UK (grant numbers A20240 and A29580 to N.R., C7535/A6408 and C2195/A8466 to P.G.C., and C2195/A8466 to M.R.M.), the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement n.337905 (to N.R.), and the Mark Foundation Institute for Integrated Cancer Medicine at the University of Cambridge (to G.D.S., J.E.A., and C.C.). R.C.R. and D.M.R. are part-funded by the Cambridge Biomedical Research Centre and the Cambridge Cancer Centre. The MelResist study was funded by the Lewis Charitable Fund and Addenbrooke’s Charitable Trust. Infrastructure for the DIAMOND study was provided by the Cancer Research UK Cambridge Cancer Centre (Major Centre Award A25117) and NIHR Biomedical Research Centre. The Human Research Tissue Bank is supported by the NIHR Cambridge Biomedical Research Centre. Author contributions: J.C.M.W., K.H., and N.R. wrote the manuscript. J.C.M.W., K.H., S.M., P.Y.C., C.A., F. Mouliere, R.M., A.S., W.N.C., and C.G.S. generated data. J.C.M.W., K.H., E.F., J.M., F. Mouliere, D.C., and N.R. developed the INVAR pipeline. J.C.M.W., K.H., J.M., E.F., A.M., F. Marass, J.M., and D.C. analyzed data and performed statistical analysis. A.B.G. and F.A.G. performed imaging analysis. A.R.-V., E.B., G.Y., I.H., W.N.C., D.G., G.D.S., R.M., K.M.B., J.E.A., C.C., D.M.R., and R.C.R. coordinated studies and participated in design. C.P., D.G., A.D., U.M., P.G.C., and N.R. led the MelResist study. P.G.C. and M.R.M. led the AVAST-M study. C.G.S., C.M., P.G.C., and N.R. supervised the project. All authors reviewed and approved the manuscript. Competing interests: J.C.M.W., K.H., E.F., F. Mouliere, C.G.S., C.M., and N.R. are inventors of the patent “Improvements in variant detection” (WO2019170773A1), filed by Cancer Research UK. N.R. and D.G. are cofounders, shareholders, and officers or consultants of Inivata Ltd., a cancer genomics company that commercializes ctDNA analysis. Inivata had no role in the conceptualization, study design, data collection and analysis, decision to publish, or preparation of the manuscript. G.D.S. has received educational grants from Pfizer, AstraZeneca, and Intuitive Surgical; consultancy fees from Merck, Pfizer, EUSA Pharma, and CMR Surgical; travel expenses from Pfizer; and speaker fees from Pfizer. All other authors declare that they have no competing interests. Data and materials availability: Sequencing data are archived at the European Genome-phenome archive (EGA; www.ebi.ac.uk/ega/). Access can be obtained from the senior authors (via email Rosenfeld.LabAdmin{at}cruk.cam.ac.uk) under a data access agreement with the University of Cambridge, at the following EGA accession numbers: EGAS00001002959, EGAS00001003530, EGAS00001004355, EGAS00001004446, and EGAS00001004447. INVAR code is available at http://bitbucket.org/nrlab/invar.

Stay Connected to Science Translational Medicine

Navigate This Article