Research ArticleCancer

Enhanced detection of circulating tumor DNA by fragment size analysis

See allHide authors and affiliations

Science Translational Medicine  07 Nov 2018:
Vol. 10, Issue 466, eaat4921
DOI: 10.1126/scitranslmed.aat4921
  • Fig. 1 Survey of plasma DNA fragmentation with genome-wide sequencing on a pan-cancer scale.

    (A) The size profile of cfDNA can be determined by paired-end sequencing of plasma samples and reflects its organization around the nucleosome. cfDNA is released into the blood circulation by various means, each of which leaves a signature on the DNA fragment sizes. We inferred the size profile of cfDNA by analyzing with sWGS (n = 344 plasma samples from 65 healthy controls and 200 patients with cancer) and the size profile of mutant ctDNA by personalized capture sequencing (n = 19 plasma samples). (B) Fragment size distributions of 344 plasma samples from 200 patients with cancer. Samples are split into two groups based on the previous literature (6), with orange representing samples from patients with cancer types previously observed to have low amounts of ctDNA (renal, bladder, pancreatic, and glioma) and blue representing samples from patients with cancer types previously observed to have higher amounts of ctDNA (breast, melanoma, ovarian, lung, colorectal, cholangiocarcinoma, and others; see table S1). (C) Proportion of cfDNA fragments below 150 bp in those samples, grouped into cancer types as defined in (B). The Kruskal-Wallis (KW) test for difference in size distributions indicated a significant difference between the group of samples from cancer types releasing high amounts of ctDNA and the group of samples from cancer types releasing low amounts, as well as the group of samples from healthy individuals). (D) Proportion of cfDNA fragments below 150 bp by cancer type (all samples). Cancer types represented by fewer than four individuals are grouped in the “other” category. Red lines indicate the median proportion for each cancer type. ChC, cholangiocarcinoma. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

  • Fig. 2 Determining the size profile of mutant ctDNA with animal models and personalized capture sequencing.

    (A) A mouse model with xenografted human tumor cells enabled the discrimination of DNA fragments released by cancer cells (reads aligning to the human genome) from the DNA released by healthy cells (reads aligning to the mouse genome), with the use of sWGS. (B) Fragment size distribution from the plasma extracted from a mouse xenografted with a human ovarian tumor, showing ctDNA originating from tumor cells (red) and cfDNA from noncancerous cells (blue). Two vertical dashed lines indicate 145 and 167 bp. The fraction of reads shorter than 150 bp is indicated. (C) Design of personalized hybrid-capture sequencing panels developed to specifically determine the size profiles of mutant DNA and nonmutant DNA in plasma from 19 patients with late-stage cancers. Capture panels included somatic mutations identified in tumor tissue by WES. A mean of 165 mutations per patient was then analyzed from matched plasma samples. Reads were aligned and separated into fragments carrying either the reference or the mutant sequence. Fragment sizes for paired-end reads were calculated. (D) Size profiles of mutant DNA and nonmutant DNA in plasma from 19 patients with late-stage cancers were determined by tumor-guided capture sequencing. The fraction of reads shorter than 150 bp is indicated.

  • Fig. 3 Enhancing the tumor fraction from plasma sequencing with size selection.

    (A) Plasma samples collected from patients with ovarian cancer were analyzed in parallel without size selection or using either in silico or in vitro size selection. (B) Accuracy of the in vitro and in silico size selection determined on a cohort of 20 healthy controls. The size distribution before size selection is shown in green, after in silico size selection (with sharp cutoff at 90 and 150 bp) in blue and after in vitro size selection in orange. Vertical lines indicate 90 and 150 bp. (C) SCNA analysis with sWGS from plasma DNA of a patient with ovarian cancer collected before initiation of treatment, when ctDNA MAF was 0.271 for a TP53 mutation as determined by tagged-amplicon deep sequencing (TAm-Seq). Inferred amplifications are shown in blue and deletions in orange. Copy number neutral regions are shown in gray. (D) SCNA analysis of a plasma sample from the same patient as in (C), collected 3 weeks after treatment start. The MAF for the TP53 mutation at this time point was 0.068, and sWGS revealed only limited evidence of copy number alterations (before size selection). (E) Analysis of the same plasma sample as in (D) after in vitro size selection of fragments between 90 and 150 bp in length. The MAF for the TP53 mutation increased to 0.402 after in vitro size selection, and SCNAs were apparent by sWGS. More SCNAs were detected in comparison to (C) and (D) (for example, in chr2, chr9, and chr10). SCNAs were also detected in this sample after in silico size selection (fig. S7).

  • Fig. 4 Quantifying the ctDNA enrichment by sWGS with in silico size selection and t-MAD.

    (A) Workflow to quantify tumor fraction from SCNA as a genome-wide score named t-MAD. (B) Correlation between the MAF of single-nucleotide variants (SNVs) determined by dPCR or hybrid-capture sequencing and t-MAD score determined by sWGS. Data included 97 samples from patients with multiple cancer types with matched MAF measurements and t-MAD scores. Pearson correlation (coefficient r) between MAF and t-MAD scores was calculated for all cases with MAF > 0.025 and t-MAD > 0.015. Linear regression indicated a fit with a slope of 0.44 (purple solid line). (C) Comparison of t-MAD scores determined from sWGS between healthy samples and samples collected from patients with cancer types that exhibit low amounts of ctDNA and from patients with cancer types that exhibit high amounts of ctDNA (as in Fig. 1). All samples for which t-MAD could be calculated have been included. (D) ROC analysis comparing the classification of these plasma samples from high ctDNA cancer samples (n = 189) and plasma samples from healthy controls (n = 65) using t-MAD had an AUC of 0.69 without size selection (black solid curve). After applying in silico size selection to the samples from patients with cancer, we observed an AUC of 0.90 (black dashed curve). (E) Determination of t-MAD from longitudinal plasma samples of a patient with colorectal cancer. t-MAD was analyzed before and after in silico size selection of the DNA fragments between 90 and 150 bp and then compared to the RECIST status for this patient. PR, partial response; SD, stable disease; PD, progressive disease. (F) Application of in silico size selection to six patients with long-term follow-up. t-MAD score was determined before and after in silico size selection of the short DNA fragments. Dark blue circles indicate samples in which ctDNA was detected both with and without in silico size selection. Light blue circles indicate samples where ctDNA was detected only after in silico size selection. Open circles indicate samples where ctDNA was not detected by either analysis. Times when RECIST status was assessed are indicated by a red bar for progression or an orange bar for regression or stable disease. PC, prostate cancer; CRC, colorectal cancer; ChC, cholangiocarcinoma; BC, breast cancer. The numbers correspond to the patients.

  • Fig. 5 Quantifying the ctDNA enrichment by sWGS with in vitro size selection.

    (A) The effect of in vitro size selection on the t-MAD score. For each of 48 plasma samples collected from 35 patients, the t-MAD score was determined from the sWGS after in vitro size selection (y axis) and without size selection (x axis). In vitro size selection increased the t-MAD score for nearly all samples, with a median increase of 2.1-fold (range from 1.1- to 6.4-fold). t-MAD scores determined from sWGS for 46 samples from healthy individuals were all <0.015 both before and after in vitro size selection. (B) ROC analysis comparing the classification of plasma samples from patients with cancer (n = 48) and plasma samples from healthy controls (n = 46) using t-MAD had an AUC of 0.64 without size selection (green curve). After applying in silico size selection to the samples from the patients and controls, we observed an AUC of 0.78 (blue curve), and after in vitro size selection, an AUC of 0.97 (orange curve). (C) Comparison of t-MAD scores determined from sWGS between matched ovarian cancer samples with and without in vitro size selection. The t test for the difference in means indicates a significant increase in tumor fraction (measured by t-MAD) with in vitro size selection (****P < 0.0001). (D) Detection of SCNAs across 15 genes frequently mutated in recurrent ovarian cancer, measured in plasma samples collected during treatment for 35 patients. Patients were ranked from left to right by increasing tumor fraction as quantified by t-MAD (before in vitro size selection). SCNAs were labeled as detected for a gene if the mean log2 ratio in that region was greater than 0.05. Empty squares represent copy number neutral regions, bottom left triangles in light blue indicate that SCNAs were detected without size selection, and top right triangles in dark blue represent SCNAs detected after in vitro size selection.

  • Fig. 6 Improving the detection of somatic alterations by WES in multiple cancer types with size selection.

    (A) Analysis of the MAF of mutations detected by WES in six patients with HGSOC without size selection and with either in vitro or in silico size selection. ****P < 0.0001. (B) Comparison of size-selected WES data with nonselected WES data to assess the number of mutations detected in plasma samples from six patients with HGSOC. For each patient, the first bar in light blue shows the number of mutations called without size selection, the second bar quantifies the number of mutations called after the addition of those identified with in silico size selection, and the third bar in dark blue shows the number of mutations called after addition of mutations called after in vitro size selection. (C) Patients (n = 16) were retrospectively selected from a cohort with different cancer types (colorectal, cholangiocarcinoma, pancreatic, and prostate) enrolled in early-phase clinical trials. Matched tumor tissue DNA was available for each plasma sample, and two patients also had a biopsy collected at relapse. WES was performed on tumor tissue DNA and plasma DNA samples, and in silico size selection was applied to the data. A total of 97% (2061 of 2133) of the shared mutations detected by WES showed higher MAF after in silico size selection. (D) Mutations detected only after in silico selection of WES data from 16 patients [as in (C)] compared to mutations called by WES of the matched tumor tissue. Three of 16 patients had no additional mutations identified after in silico size selection. Of the 82 mutations detected in plasma after in silico size selection, 23 (28%) had low signal in tumor WES data and were not identified in those samples without size selection.

  • Fig. 7 Enhancing the potential for ctDNA detection by combining SCNAs and fragment size features.

    (A) Schematic illustrating the selection of different size ranges and features in the distribution of fragment sizes. For each sample, fragmentation features included the proportion (P) of fragments in specific size ranges, the ratio between certain ranges, and a quantification of the amplitude of the 10-bp oscillations in the 90- to 145-bp size range calculated from the periodic “peaks” and “valleys.” (B) PCA comparing cancer and healthy samples using data from t-MAD scores and the fragmentation features. Red arrows indicate features that were selected as informative by the predictive analysis. (C) Workflow for the predictive analysis combining SCNAs and fragment size features. sWGS data from 182 plasma samples from patients with cancer types with high amounts of ctDNA (colorectal, cholangiocarcinoma, lung, ovarian, and breast) were split into a training set (60% of samples) and a validation set (validation data 1, together with the healthy individual validation set). A further dataset of sWGS from 57 samples of cancer types exhibiting low amounts of ctDNA (glioma, renal, and pancreatic) was used as validation data 2, together with the healthy individual validation set. Plasma DNA sWGS data from healthy controls were split into a training set (60% of samples) and a validation set (used in both validation data 1 and validation data 2). (D) ROC curves for validation data 1 (samples from patients with cancer with high ctDNA amounts, 68; healthy, 26) for three predictive models built on the pan-cancer training cohort (cancer, 114; healthy, 39). The beige curve represents the ROC curve for classification with t-MAD only, the long-dashed green line represents the LR model combining the top five features based on recursive feature elimination [t-MAD score, 10-bp amplitude, P(160 to 180), P(180 to 220), and P(250 to 320)], and the red dashed line shows the result for a RF classifier trained on the combination of the same five features, independently chosen for the best RF predictive model. FF, fragment size features. (E) ROC curves for validation data 2 (samples from patients with cancer with low ctDNA amounts, 57; healthy, 26) for the same three classifiers as in (D). The beige curve represents the model using t-MAD only, the long-dashed green curve represents the LR model combining the top five features [t-MAD score, 10-bp amplitude, P(160 to 180), P(180 to 220), and P(250 to 320)], and the red dashed curve shows the result for a RF classifier trained on the combination of same five predictive features. (F) Plot representing the probability of classification as cancer with the RF model for all samples in both validation datasets. Samples are separated by cancer type and sorted within each by the RF probability of classification as cancer. The horizontal dashed line indicates 50% probability (achieving specificity of 24 of 26, 92.3%), and the long-dashed line indicates 33% probability (achieving specificity of 22 of 26, 84.6%).

Supplementary Materials

  • www.sciencetranslationalmedicine.org/cgi/content/full/10/466/eaat4921/DC1

    Materials and Methods

    Fig. S1. Flowchart summarizing the experiments performed in this study and the sample numbers used at each step.

    Fig. S2. Size distribution of cfDNA determined by sWGS for different cancer types.

    Fig. S3. Insert size distribution of mutant cfDNA determined with hybrid-capture sequencing for 19 patients.

    Fig. S4. DNA fragment size distribution for plasma samples from patients with ovarian cancer.

    Fig. S5. Quality control assessed for in vitro size selection.

    Fig. S6. Quality control assessed for in vitro and in silico size selection on healthy control samples.

    Fig. S7. SCNA analysis of the segmental log2 ratio determined after sWGS (<0.4× coverage) for the patient OV04-83.

    Fig. S8. SCNA analysis of the segmental log2 ratio determined after sWGS (<0.4× coverage) for plasma samples from patients with ovarian cancer (from the OV04 study).

    Fig. S9. MAF and t-MAD score compared for different cancer types.

    Fig. S10. t-MAD score measured on a plasma DNA dilution series.

    Fig. S11. t-MAD scores and fragmentation features compared to tumor volume.

    Fig. S12. Changes to t-MAD after in vitro size selection.

    Fig. S13. SCNA analysis in cfDNA from plasma samples collected at baseline and after treatment for 13 patients with HGSOC.

    Fig. S14. MAF for SNVs called by WES with and without size selection.

    Fig. S15. TAm-Seq before and after in vitro size selection.

    Fig. S16. Mutations in clinically relevant genes detected by WES with and without in silico size selection.

    Fig. S17. Size distribution of nonmutant DNA and ctDNA concentration.

    Fig. S18. ROC curve for individual fragmentation features in high ctDNA cancers versus controls.

    Fig. S19. t-MAD score compared with seven fragmentation features.

    Fig. S20. Performance metrics for the two algorithms, LR and RF.

    Fig. S21. LR and RF models using the fragmentation features without t-MAD.

    Table S1. Summary table of the patients and samples included in this study.

    Table S2. Values for nine fragmentation features determined from sWGS data for the samples included in the study.

    Table S3. t-MAD score for the 48 plasma samples of the OV04 cohort before and after in vitro size selection.

    Table S4. Log2 of the signal ratio observed by sWGS of the plasma samples from the OV04 cohort.

    Table S5. Mutations called by WES of six patients selected from the OV04 cohort.

    Table S6. Mutations called by WES data of the plasma samples from 16 patients from the CoPPO cohort.

    References (48, 49)

  • The PDF file includes:

    • Materials and Methods
    • Fig. S1. Flowchart summarizing the experiments performed in this study and the sample numbers used at each step.
    • Fig. S2. Size distribution of cfDNA determined by sWGS for different cancer types.
    • Fig. S3. Insert size distribution of mutant cfDNA determined with hybrid-capture sequencing for 19 patients.
    • Fig. S4. DNA fragment size distribution for plasma samples from patients with ovarian cancer.
    • Fig. S5. Quality control assessed for in vitro size selection.
    • Fig. S6. Quality control assessed for in vitro and in silico size selection on healthy control samples.
    • Fig. S7. SCNA analysis of the segmental log2 ratio determined after sWGS (<0.4× coverage) for the patient OV04-83.
    • Fig. S8. SCNA analysis of the segmental log2 ratio determined after sWGS (<0.4× coverage) for plasma samples from patients with ovarian cancer (from the OV04 study).
    • Fig. S9. MAF and t-MAD score compared for different cancer types.
    • Fig. S10. t-MAD score measured on a plasma DNA dilution series.
    • Fig. S11. t-MAD scores and fragmentation features compared to tumor volume.
    • Fig. S12. Changes to t-MAD after in vitro size selection.
    • Fig. S13. SCNA analysis in cfDNA from plasma samples collected at baseline and after treatment for 13 patients with HGSOC.
    • Fig. S14. MAF for SNVs called by WES with and without size selection.
    • Fig. S15. TAm-Seq before and after in vitro size selection.
    • Fig. S16. Mutations in clinically relevant genes detected by WES with and without in silico size selection.
    • Fig. S17. Size distribution of nonmutant DNA and ctDNA concentration.
    • Fig. S18. ROC curve for individual fragmentation features in high ctDNA cancers versus controls.
    • Fig. S19. t-MAD score compared with seven fragmentation features.
    • Fig. S20. Performance metrics for the two algorithms, LR and RF.
    • Fig. S21. LR and RF models using the fragmentation features without t-MAD.
    • Legends for tables S1 to S6.
    • References (48, 49)

    [Download PDF]

    Other Supplementary Material for this manuscript includes the following:

    • Table S1 (Microsoft Excel format). Summary table of the patients and samples included in this study.
    • Table S2 (Microsoft Excel format). Values for nine fragmentation features determined from sWGS data for the samples included in the study.
    • Table S3 (Microsoft Excel format). t-MAD score for the 48 plasma samples of the OV04 cohort before and after in vitro size selection.
    • Table S4 (Microsoft Excel format). Log2 of the signal ratio observed by sWGS of the plasma samples from the OV04 cohort.
    • Table S5 (Microsoft Excel format). Mutations called by WES of six patients selected from the OV04 cohort.
    • Table S6 (Microsoft Excel format). Mutations called by WES data of the plasma samples from 16 patients from the CoPPO cohort.

Navigate This Article