Research ArticleKidney transplant

A urine score for noninvasive accurate diagnosis and prediction of kidney transplant rejection

See allHide authors and affiliations

Science Translational Medicine  18 Mar 2020:
Vol. 12, Issue 535, eaba2501
DOI: 10.1126/scitranslmed.aba2501

Transplant monitoring via urine

For kidney transplant recipients, prompt and accurate detection of transplant rejection is vital for timely intervention. Unfortunately, the gold standard for diagnosis of rejection is kidney biopsy, an invasive procedure. Moreover, the presence of pathological rejection does not always correlate with symptoms or laboratory values, so even clinically stable patients have to be subjected to periodic biopsies. To facilitate the monitoring of kidney transplant patients, Yang et al. adapted a urine biomarker assay similar to one previously tested for the detection of chronic kidney disease. The noninvasive urine assay showed strong performance in multiple cohorts of patients, suggesting its potential for clinical translation.

Abstract

Accurate and noninvasive monitoring of renal allograft posttransplant is essential for early detection of acute rejection (AR) and to affect the long-term survival of the transplant. We present the development and validation of a noninvasive, spot urine–based diagnostic assay based on measurements of six urinary DNA, protein, and metabolic biomarkers. The performance of this assay for detecting kidney injury in both native kidneys and renal allografts is presented on a cohort of 601 distinct urine samples. The urinary composite score enables diagnosis of AR, with a receiver-operator characteristic curve area under the curve of 0.99 and an accuracy of 96%. In addition, we demonstrate the clinical utility of this assay for predicting AR before a rise in the serum creatinine, enabling earlier detection of rejection than currently possible by standard of care tests. This noninvasive, sensitive, and quantitative approach is a robust and informative method for the rapid and routine monitoring of renal allografts.

INTRODUCTION

Although kidney transplantation is the treatment of choice for patients with end-stage renal disease, the life of a kidney transplant is limited due to episodes of preventable subclinical and acute rejection (AR). While proper adherence to immunosuppressive medications, as well as proper determination of the therapeutic concentrations on a per-patient basis, can assist in maintaining the stable status of the transplant, ongoing diagnostic monitoring is recognized as essential to monitor the health of the transplanted kidney.

However, current methods for the detection of kidney rejection are suboptimal for several reasons. Regular monitoring of serum creatinine remains the most common method for surveillance after renal transplantation, but it is an insensitive predictor and only increases upon a deficiency in kidney function rather than kidney injury (1). The gold standard of renal biopsy suffers from interobserver variability, complications inherent to the invasive nature of the technique, and relatively high cost (2, 3). Recently, molecular diagnostics based on blood or plasma have been developed on the basis of composite RNA scores (4, 5) or donor-derived cell-free DNA (dd-cfDNA) fractions as determined by massively multiplexed polymerase chain reaction (PCR) (6) or next-generation sequencing (79).

Although sequencing or single-nucleotide polymorphism–based methods of quantifying dd-cfDNA from plasma have been used to detect allograft rejection, they remain inconvenient for patients because of the requirement of blood draws, requiring special tubes and kits for collection. As such, urinary biomarkers remain a promising solution to these problems. Because urine is the direct ultrafiltrate of the kidneys, it provides an accurate window into the status of the allograft. In addition, urine is a readily accessible biofluid, and its collection is truly noninvasive. dd-cfDNA has been found in both the urine (10, 11) and serum of renal transplant recipients and has shown initial utility in discriminating AR from stable outcomes. Although previous research in this space has mostly relied upon assessment of the urine pellet mRNA by PCR-based urine assessment techniques (12, 13), the reproducibility of this assay is limited by failure to obtain an adequate quantity and quality of RNA in 20 to 30% of cases (12).

We hypothesized that the use of a microwell approach to measure a combination of nucleic acid, metabolite, and protein biomarkers, on raw, untimed urine, with utilization of the urine supernatant alone in a robust, enzyme-linked immunosorbent assay (ELISA)–based approach that avoids mRNA assessments (1416) would enable a more robust measure for rapid assessment of rejection in a renal allograft. To this end, we first assessed the performance of target markers in a Kidney Injury Test assay for the staged detection of chronic kidney disease (CKD) in the native, nontransplanted kidney (15), and in this study, we now demonstrate that the urinary abundance of these same markers can be used to develop a Q score for the sensitive and specific detection of AR injury in the renal allograft. We trained and independently validated the Q score on urine samples from renal transplant recipients with and without biopsy-confirmed graft rejection. In addition, we also provide a comparison of the kidney injury scores across the same markers in a subset of healthy controls and patients with CKD to provide an estimate of the baseline renal health in the stable renal transplant samples.

Here, we present the results of our custom assay on six selected DNA, protein, and metabolite biomarkers consisting of cfDNA, methylated cfDNA (m-cfDNA), clusterin, total protein, creatinine, and CXCL10 on 601 prospectively collected urine samples from three transplant centers. These samples were collected from both pediatric and adult renal allograft recipients immediately before a renal allograft biopsy. Comparison to the aggregate of paired biopsy results and clinical pathology diagnoses indicated that the developed urine score (the Q score) trained and validated in this study can be used to identify grafts undergoing AR, both antibody-mediated rejection (ABMR) and T cell–mediated rejection (TCMR), and can also noninvasively identify allografts that are stable and without any substantive histological injury. These findings suggest that this urine score has the potential to replace the renal biopsy as the gold standard of posttransplant monitoring and that such measurements can be used for the detection of rejection episodes and the proactive management of immunosuppressant titering.

RESULTS

Detailed patient subcohorts assess the utility of the urine Q score for kidney transplant rejection diagnosis

We have designed the study cohorts across 601 urine samples to assess the performance of the urine score, known as Q score, biomarkers, including CXCL10, clusterin, cfDNA, m-cfDNA, total protein, and creatinine, on the assessment of transplant rejection (data file S1). These biomarkers had previously been identified for use in the Kidney Injury Test, an assay assessing CKD (15). From these biomarkers, we developed and validated the quantitative Q score on multiple clinical sample groups. This study provided an assessment of the ability to noninvasively diagnose biopsy-confirmed acute renal transplant rejection in 332 distinct adult and pediatric (n = 80) kidney transplant recipients with 332 unique biopsy-matched urine samples (where the urine was collected immediately before the performance of a biopsy) and an additional 32 biopsy-matched urine samples obtained within 1 week to 8 months before a biopsy-confirmed rejection episode in 32 AR patients (Fig. 1). We were able to assess the performance of this urine-based diagnostic assay across a spectrum of recipient age, inclusive of pediatric and adult age groups (age range, 10 months to 65 years), recipients of both genders (50% males), various recipient/donor ethnic groups, and with variations in donor gender (49% male) and human leukocyte antigen (HLA)–match grades (Table 1). Eight percent of recipients had repeat transplants. After transplant, we evaluated the ability of this assay to diagnose rejection in protocol biopsies (55%), when the serum creatinine is stable, or indication biopsies, triggered for graft dysfunction (based on a >20% rise in the serum creatinine from baseline). The timing of the protocol biopsies in this study included biopsies at 1 (5%), 3 (20%), 6 (70%), and 12 (5%) months after transplant. The mean time for graft dysfunction biopsies was at 22 months after transplant. Given the size of this dataset, with 103 biopsy-confirmed ARs and the conduct of protocol biopsies at all three clinical sites, we were also able to assess the performance of the biopsy to diagnose AR at the time of clinical graft dysfunction [clinical AR (cAR); n = 62] as compared to AR diagnosed at the time of protocol biopsies, when there is stable graft function [sub-cAR (scAR); n = 41].

Fig. 1 Study design.

In this multicenter, prospective study, a total of 601 patient samples were assessed. Urine samples (364) belonged to renal transplant recipients, where every urine sample was paired with a renal transplant biopsy for phenotype classification into the following diagnoses: stable (STA; n = 170), acute rejection (AR) (n = 103), borderline AR (bAR; n = 50), and BK virus nephropathy (BKVN; n = 9). An additional 32 patients with AR had urine samples collected before the rejection episode; these samples were also paired with biopsies. In the context of a stable kidney transplant, we compared patients’ urine scores with a scores run on urine samples from healthy controls without any renal insufficiency (n = 54). In addition, we compared the urine scores from stable kidney transplant patients against the scores from patients with early CKD (stages 1 and 2; n = 183). The selected urinary biomarkers were measured on all urine samples collected, and statistical analysis and modeling were performed on a subset of patients, defined as STA or AR, and split out randomly as a training set (n = 111) for modeling the urine score. The fixed Q score model from the training set was then applied to two separate validation sets consisting of STA and AR samples, with set sizes of 103 and 59 for validation set 1 and validation set 2, respectively. The fixed urine score was next applied to samples with other transplant injuries (bAR/BKVN; n = 59) and for prediction of rejection in the cohort of patients (preAR samples; n = 32), with urine samples collected within 8 months of a biopsy-confirmed AR episode. The preAR samples were collected at a time where the paired biopsy was histologically stable, and there was no graft dysfunction.

Table 1 Demographics and characteristics.

Values are reported in the given units with SD in parentheses, and all comparisons between groups were nonsignificant. Recipients are both adult (>18 years) and pediatric (≤18 years) in each cohort. Clinical pathology data are based on the day of biopsy, which is timed with the urine collection, which occurred before performance of the biopsy. ABMR/TCMR split is of AR-confirmed patients, based on the Banff allograft pathology classification (17). All other demographic and clinical information is based on the day of the kidney transplant.

View this table:

An assessment of all indication biopsies (n = 160) showed that only 38.8% of transplants with clinical graft dysfunction had a confirmed histological diagnosis of AR using the established Banff schema (17). For the remainder of the indication biopsies, 20% were histologically read as non-AR injuries [2.3% BK viral nephropathy (BKVN) and 17.5% borderline AR (bAR)] and 40.6% as stable or no injury (STA). Within the STA group of patients with clinical graft dysfunction, 12 (7.5%) patients went on to develop biopsy-confirmed AR at further follow-up (these samples were part of the preAR analysis group in this study). A similar assessment of all protocol biopsies (n = 201) showed that 20.4% had stable graft function but still had scAR, histological rejection currently missed by serum creatinine monitoring alone. For the remainder of the protocol biopsies, 19.4% had other injuries (2% BKVN and 17.4% bAR), and 60.2% had no injury (STA); in this latter group, 20 patients went on to develop AR on subsequent follow-up (these samples were also part of the preAR analysis group in this study).

Target urine biomarkers can discriminate stable patients from patients with AR

The selected biomarkers of urinary cfDNA, m-cfDNA, creatinine, CXCL10, clusterin, and total protein (15) were assessed on all collected urine samples. To this end, we developed a composite Q score ranging from 0 to 100, on all six biomarkers, on 111 samples in the training set using a Random Forest bootstrap model. The utility of the individual biomarkers in discriminating stable (STA) and AR outcomes was determined by multivariate nominal logistic regression (table S1) to find an optimal threshold (≥32) that maximized sensitivity and specificity for detection of rejection in the urine sample paired with an allograft biopsy in the training set of 111 samples. This locked threshold of ≥32 was then applied to two independent validation sets to assess the clinical performance of the threshold to discriminate between rejection and no rejection. Model analysis of the urine score demonstrated that each of the biomarkers provided independent value toward the prediction of kidney transplant rejection status with varying significance (table S2), highlighting that multiple urinary biomarkers had complementary value in discriminating acute renal transplant rejection compared to using a single marker (such as cfDNA) alone. The scores easily distinguished between STA [median, 11.2; 95% confidence interval (CI), 8.40 to 12.71] and AR patients (median, 58.8; 95% CI, 50.3 to 69.2) (P < 0.0001) in the training set (Fig. 2A) at a defined threshold of 32. The threshold was chosen because it provided the combination of sensitivity and specificity of 94.9 and 100%, respectively, and maximized specificity. A receiver-operator characteristic (ROC) curve of the urine score had an area under the curve (AUC) of 0.99 (0.99 to 1.00, P < 0.0001) in the training set (Fig. 2B).

Fig. 2 The selected urinary biomarkers could segregate nonrejection patients from those with AR.

(A) A urine score model was trained on 111 samples consisting of 72 STA and 39 AR samples to generate a scaled Q score ranging from 0 to 100. The distribution of the STA and AR groups is depicted in the figure. (B) The quiescence threshold was set at 32 with a corresponding sensitivity of 94.9% and specificity of 100%. The AUC of the ROC curve was 0.99 (P < 0.0001). (C) The urine score model was applied to a set of 103 independent samples consisting of 71 STA and 32 AR samples. The median and 95% CI for the STA and AR group were 13.14 (8.75 to 17.94) and 45.16 (40.77 to 57.87), respectively (P < 0.0001). (D) At the predetermined quiescence threshold, the sensitivity was 90.6% and the specificity was 91.6%. The AUC of the ROC curve was 0.98 (P < 0.0001). (E) The fixed urine score model was applied to a set of 59 independent samples consisting of 27 STA and 32 AR. The median and 95% CI for the STA and AR groups were 16.21 (8.16 to 26.22) and 67.25 (60.46 to 78.73), respectively (P < 0.0001). (F) At the predetermined quiescence threshold, the sensitivity was 100.0% and the specificity was 96.3%. The AUC of the ROC curve was 1.00 (P < 0.0001).

Applying the same locked model to independent samples in the first validation set, the urine score distinguished between STA (median, 13.1; 95% CI, 8.8 to 17.9) and AR patients (median, 45.2; 95% CI, 40.8 to 57.9) (P < 0.0001) (Fig. 2C). A ROC curve of the score in the first validation set found that the previously established threshold from the training set had a sensitivity of 90.6%, a specificity of 91.6%, and an AUC of 0.98 (0.96 to 1.00; P < 0.0001) (Fig. 2D). In validation set 2, we could assess the performance of the urine score to again distinguish AR from STA (Fig. 2E). Here, we observed that the scaled score resulted in 100% sensitivity and 96.3% specificity with an AUC of 1.00 (1.00 to 1.00, P < 0.0001) (Fig. 2F). Aggregating all AR and STA samples together, the AUC was 0.99 (0.98 to 0.99, P < 0.0001), with a sensitivity and specificity of 95.2 and 95.9%, respectively, at the established threshold with a classification accuracy of 96%.

We next evaluated the fixed urine score to evaluate other injuries, 50 bAR and 9 BKVN samples, and these samples showed median scores of 38.47 (95% CI, 34.71 to 40.80) and 23.70 (95% CI, 10.48 to 53.38), respectively (Fig. 3A). A ROC curve of AR versus no rejection (NR; inclusive of STA, bAR, and BKVN) showed an AUC at 0.96 (95% CI, 0.94 to 0.98) (P < 0.0001) (Fig. 3B).

Fig. 3 Application of the urine score to other types of injuries.

(A) The urine score model was applied to a set of samples consisting of all the previously identified STA and AR along with bAR and BKVN samples. The median and 95% CI for the bAR and BKVN group were 38.57 (34.71 to 40.80) and 23.70 (10.48 to 53.38), respectively (P < 0.0001 for STA versus AR and STA versus BKVN; P < 0.01 for AR versus bAR and AR versus BKVN). (B) The AUC of the ROC curve for AR versus all other outcomes was 0.96 (P < 0.0001).

We could observe gradations in the urine score across the transplant sample, lowest in STA, moderate in bAR, and highest in AR. The gradation of the scores was also seen when we profiled the same six markers in urine samples from 54 healthy individuals (median, 10.80; 95% CI, 7.90 to 12.20) and in urine samples from 183 patients with stage 1 (median, 24.27; 95% CI, 23.10 to 25.12) and stage 2 (median, 34.24; 95% CI, 32.17 to 37.12) CKD using the native kidney injury scores by Watson et al. (15) (Fig. 4).

Fig. 4 Patterns of the urine score in kidney injuries.

The urine scores were aggregated for different outcomes. Healthy control (HC), 10.80 (7.90 to 12.20); CKD stage 1 (CKD1), 24.27 (23.10 to 25.12); CKD stage 2 (CKD2), 34.24 (32.17 to 37.12); STA, 12.19 (9.75 to 14.39); bAR, 38.47 (34.71 to 40.80); AR, 58.76 (54.95 to 63.40). Data are given as median and 95% CI.

Relationship between the urine score and clinical parameters

Aggregating the training and validation sets together to assess the classification performance of AR and no-rejection clinical outcomes, the Q score (AUC of 0.99; 95% CI, 0.98 to 0.99; P < 0.0001) performed better than proteinuria [protein/creatinine (Pr/Cr)] (AUC of 0.76; 95% CI, 0.69 to 0.82; P < 0.0001) and estimated glomerular filtration rate (eGFR) (AUC of 0.86; 95% CI, 0.81 to 0.98; P < 0.0001) (Fig. 5A). Because 60% (62 of 103) of all AR had clinical graft dysfunction, we would expect the eGFR to perform moderately well, but it failed to pick up all cases of scAR, where no decrease in eGFR was expected. At the quiescence threshold of 32, the sensitivity and specificity of the urine score were 95.2 and 95.9%, respectively. On the basis of a 25% prevalence of rejection in at-risk populations, the positive predictive value (PPV) was projected to be 86.9% and the negative predictive value (NPV) was projected to be 98.4%.

Fig. 5 Comparison of the urine score performance.

(A) The urine score performed better than eGFR or the urinary protein/creatinine ratio in discriminating AR versus STA. For the entire set of 170 STA and 103 AR, the Q score, eGFR, and protein/creatinine (Pr/Cr) ratio were plotted and the AUC was calculated. The AUCs were 0.99, 0.86, and 0.76, respectively. (B) Similar performance of the urine score for noninvasive diagnosis of biopsy-confirmed pediatric and adult AR. The urine score performed well for diagnosis of AR in children ≤18 years old as well as adult recipients >18 years old: pediatric AUC = 0.95 (no AR 30, AR 21); adult AUC = 0.99 (no AR 138, AR 82).

As seen in Fig. 5B, the urine score was able to discern AR, irrespective of recipient age. Notably, the AUCs for the urine score in discriminating STA from AR in both pediatric and adult patients were similar, with AUCs of 0.95 and 0.99, respectively.

Multivariable regression analysis did not show any correlation of the score with donor source (living versus deceased), recipient or donor gender, age, sensitization status, repeat transplantation, or ethnicity (table S3). De novo donor-specific antibody (DSA) was detected in 24 of the 38 patients diagnosed with ABMR.

Relationship between the Q score and AR parameters

On the basis of the biopsy histology grading by Banff (17), we ascertained the correlation of the urine score with various rejection phenotypes in a total of 103 biopsy-confirmed rejections that met histological criteria (17) for diagnosis as either TCMR (n = 65) or ABMR (n = 38). There was no difference (P = 0.776; Fig. 6A) in the scores between TCMR and ABMR. Among the samples with AR, we observed that the scores were generally higher at higher biopsy rejection grades [test for linear trend with one-way analysis of variance (ANOVA), P = 0.0004; Fig. 6B]. A subanalysis of patients with confirmed AR found that there was no significant difference between the scores for those identified via for-cause biopsies as compared to protocol biopsies (Fig. 6C), highlighting that substantive tissue injury can be masked by normal eGFR.

Fig. 6 The urine score can measure and distinguish between clinically relevant transplant parameters.

(A) Patients with AR were split into those with ABMR (n = 38) and TCMR (n = 65). There was no significant difference between these two groups (P = 0.776). ns, not significant. (B) Patients with AR were split into increasing histology score as per the Banff histology classification. One-way ANOVA with test for linear trend indicated a significant difference (P = 0.0004). (C) The urine scores were no different in patients with biopsy-confirmed AR, whether the AR was diagnosed on a for-cause (n = 47) or protocol (n = 18) (P = 0.395) biopsy. (D and E) The urine score correlates with the paired biopsy inflammation (i) score (P < 0.0001) (D) and tubulitis (t) score (P < 0.0001) (E). (F) When the Q score was applied to patients with samples collected before confirmed episodes of AR, 38% had scores above threshold for preAR at variable times up to 8 months before the rejection episode.

In addition, there was a correlation of the urine score with graft rejection injury as shown by the inflammation or Banff i score (R2 = 0.4271; P < 0.0001) (Fig. 6D) and the tubulitis or Banff t score (R2 = 0.236; P < 0.0001) (Fig. 6E) (17). The variations of the score in urine samples collected at the time of biopsy confirmed BKVN injury in the graft, with the urine score being higher in BKVN patients with higher Banff i scores (Fig. 6D).

Relationship between the urine score and its clinical utility for prediction of rejection development

The analysis of the urine scores for the 32 urine samples collected before the detection of rejection by biopsy demonstrated that urine scores could be variably high at different times, weeks to months before an AR episode was detected by the current standard of care tests, serum creatinine, and biopsy. Because all of these patients went on to develop biopsy-proven AR, we suggest that prediction of rejection with the urine Q score may have been possible in 38% of patients in this subsampling (Fig. 6F).

Immune quiescence can be defined by the Q score threshold

Most (94%) of the samples that had scores above the threshold for AR (≥32; n = 159) had a clinical diagnosis of active AR, early AR as evidenced by borderline changes on the biopsy, or went on to develop biopsy-confirmed AR between weeks and months from the time of sampling (Fig. 7A), and 94% of protocol biopsies with a score of ≥32 had histological AR. Conversely, if a patient was scheduled for a protocol biopsy and the urine score was below the threshold, the likelihood of the patient having a Banff-graded AR was only 5%. On the basis of this observation, a score of less than 32 could provide clinically actionable value by suggesting that the allograft is likely to be in a state of immune quiescence.

Fig. 7 The urine score can measure and distinguish between clinically relevant transplant parameters.

(A) In the total cohort of 364 transplant samples, there were 160 clinically indicated biopsies, of which 90 had a urine score above the rejection threshold and 70 below the threshold. Of the 204 protocol biopsies, 71 had a score above the rejection threshold and 133 below. The breakdown of different biopsy diagnoses paired with the sample is shown in the figure as percentages. (B) Accuracy tables for biopsy (for-cause versus protocol) and the Q score (score < 32 versus score ≥ 32) for AR and STA classification.

In the event of clinical graft dysfunction and a urine score above the rejection threshold, the likelihood that the patient had varying grades of rejection was 93%. In the event of clinical graft dysfunction and a score below the threshold, other causes of graft injury should be clinically explored because the likelihood of Banff-graded AR was less than 1%. Notably, the detection of bAR was greater in the surveillance biopsy group as compared to the for-cause biopsy group. This is not unexpected, because there is a lower likelihood of earlier stages of rejection causing clinical graft dysfunction. Furthermore, in 50% (103 of 204) of the samples, where the graft function was excellent and the patient was clinically stable, a protocol biopsy would not have been necessary if the information from the urine score had been available.

The comparative performance of the two application methods of biopsy (cause versus protocol) and the quiescence versus the rejection threshold of the urine score for AR diagnosis are shown in Fig. 7B, highlighting that the most common indication for performing a renal allograft biopsy at all transplant programs, a drift in the serum creatinine, only results in an accurate diagnosis of rejection ~50% of the time.

DISCUSSION

Renal injury is a spectrum of disease that progresses after the initiating insult, the triggers being varied, including genetic and structural problems (encountered mostly in children), renal inflammation, and more indolent causes of injury due to altered renal hemodynamics. Our earlier study (15) demonstrated that the selected six urine markers could detect different stages of renal injury and were more sensitive for the detection of early renal injury, over paired values of eGFR, serum creatinine, or new-onset proteinuria. In this study, we applied the urinary score for the analysis of stable, injured, and rejecting renal allografts and evaluated the performance of the score to detect biopsy-confirmed AR. We also compared these results with nontransplant subjects to compare injury in AR with CKD injury in the native kidney and to compare kidney injury in stable allografts with normal healthy controls. The Q score includes six DNA, protein, and metabolite markers, and unlike a simple threshold detection of 1% for AR, as used for dd-cfDNA measurements in blood (6, 8), the Q score is quantitative, scaled from 0 to 100, allowing for sensitive tracking of AR progression and recovery.

AR with graft dysfunction, or clinical rejection (cAR), is the group of rejections that transplant centers report as their annual rejection incidence. In transplant centers where protocol biopsies are not performed, the statistic of scAR is not available for assessment. Although short-term adverse allograft outcomes including cAR episodes have declined over past decades to ~10%, proportionate improvement in long-term allograft survival remains unrealized because much of rejection injury goes unrecognized as it is subclinical (1820), and thus, there is a lack of effective treatment of rejection and inflammation (2127). The current standard of care in transplant centers worldwide requires an invasive biopsy for AR diagnosis, a procedure burdened by clinical risks and costs. To obviate these issues, previous studies have tested noninvasive profiling of urinary RNA (12, 13) and blood transcriptomic signatures (4, 28, 29), but these assays have not been integrated into clinical care due to issues that relate to sample handling, processing, assay complexity, and/or robustness. We propose that using urine supernatant alone, replacing the need for phlebotomy or extensive urine sample preparation and handling (as needed for urine mRNA isolation), and avoiding the need for amplification and sequencing (as used for plasma cfDNA assays), provides the patient and clinician the option of monitoring the allograft with high accuracy and simplicity. Because the assay requires as little as 5 ml of urine, it can also be performed in cases of limited urine volumes, such as graft oliguria or an infant recipient.

Unlike other published studies, where the eGFR has been shown to be a poor measure for detection of rejection (8), in this study, we observed somewhat better performance of eGFR as an independent measure to detect AR, because 64 of 103 rejections in this study had clinical graft dysfunction. Nevertheless, eGFR measurement cannot help in the assessment of subclinical rejections, which begs the development of more sensitive AR-specific assays. In this study, we assessed the performance of the urine score for diagnosis of both cAR and scAR and observed similar increases in the urine score during both AR types. An analysis of all the biopsy data highlights that the current use of the serum creatinine drift to guide use of a diagnostic biopsy, or the fixed time point sampling of the allograft by a protocol biopsy, is a weak diagnostic measure for AR. The diagnostic accuracy for the invasive biopsy, combining all clinically indicated and protocol biopsies, for diagnosis of AR in our study is low, at 28.5%. Even for clinically indicated biopsies, the accuracy remains low at 54%.

Although the stable samples all clustered below the detection threshold for rejection, we observed that in comparison with the scores in CKD patients, the stable renal transplant may already have some baseline renal injury. Thus, it appears that the stable allograft, although without rejection, may carry some baseline burden of renal impairment, possibly due to the impact of early ischemia reperfusion injury, exposure to nephrotoxic drugs such as calcineurin inhibitors (30), or subclinical alloimmune injury (31).

We confirmed that urine samples from biopsy-confirmed AR patients detected both TCMR and ABMR, highlighting the intrinsic overlapping molecular axes in both types of rejection, particularly the presence of monocyte activation and the coexistence of endothelial injury with cellular inflammation in both TCMR and ABMR. The Q score performed with a high degree of accuracy in the independent validation cohorts with high NPVs and PPVs considerably better than blood-based assays for rejection diagnosis (4, 6, 8, 18, 32).

The histology-derived Banff i and t scores are currently used in the clinical management of patients as a reflection of inflammation and tubular injury in the transplant. The correlations of the scaled urine score with these biopsy histology grades suggest that the quantitative nature of the score could enable noninvasive assessment of rejection severity and immunosuppression therapy choices. Thus, the actual value of the urine score may be able to drive decisions of immunosuppression intensification for rejection and close monitoring of treatment efficacy and rejection response and resolution. The clinically relevant quantitative nature of the score provides granularity for serial monitoring that may not be possible in blood cfDNA assays that only provide a single threshold for injury (6, 8).

In addition, we observed that if the urine score was below the quiescence threshold (<32; when a diagnosis of histological STA is highly likely), performance of an invasive biopsy for “rule-out” rejection was not necessary in 55% of patients. Conversely, the high sensitivity of the assay showed that we rarely missed patients who were having active rejection as confirmed by biopsy. In our dataset, 43% or 69 of 130 clinically indicated biopsies were quiescent, and 65% or 130 of 201 protocol biopsies were unnecessary invasive procedures. Thus, serial performance of the score in urine samples may be able to reduce patient morbidity from unnecessary transplant biopsies, clinical visits, and related health care costs.

The main focus of this paper was on biopsy-matched cross-sectional samples because the objective was to develop and validate a noninvasive urine score for graft rejection. Besides limitations in overall representation, the clinical utility of graft biopsy is limited by the subjectivity of biopsy reporting, which also emphasizes the potential clinical utility of the urine score even as an adjunct to the biopsy, to avoid pitfalls in the variances in reporting of renal biopsy specimens. This may be especially true in the conduct of multicenter clinical trials, where multiple local pathologists may be involved, and the urine score can provide an unbiased, quantitative readout of graft rejection.

Because the urine sampling of patients in our study was by protocol timed collections, and rejection timing is unexpected, serial urine sampling after transplant in our center protocols allowed for a review of the trajectory of AR in patients with paired urine samples before biopsy-confirmed rejection, anywhere from 1 week to 8 months before confirmation of rejection by biopsy. We observed earlier detection of rejection by the urine score on 35% of samples with preAR, when biopsy histology was still negative for histological rejection. This may be explained by the fact that renal biopsy specimens represent small 2- to 3-mm sections of the total core and may not capture a focal phenomenon like allograft rejection. The ability to observe an allograft rejection risk profile over time, such as that made available through urine testing, highlights that natural variances in the tempo and cause of rejection in different individuals may have different causes. Because we only have a single urine collection before the later biopsy-confirmed AR episode in each of the 32 patients, it is possible that repeated urine sampling in preAR patients with initial low urine scores in this study may have shown an increase in the urine score with increased proximity to the time of biopsy AR diagnosis. The necessity to do serial and repeated monitoring with a noninvasive, rapid turnaround, sensitive, and specific assay may be further underscored by obtaining these additional data and may also provide the background for future studies testing the hypothesis that serial measurements of this urine assay may guide maintenance immunosuppression management.

Other limitations of the current study design include a limited assessment of longitudinal samples, especially those from stable patients, in whom the natural variation of the urine score could be assessed over time and within the same patient. It will be necessary to collect monthly samples from patients from ongoing clinical studies to assess this question over time. In addition, this study did not provide an assessment of the changes in the urine score over time after treatment of rejection. Although the assay in this study had been performed at an academic center laboratory, the assay is being transitioned to run as an analytically validated assay in a College of American Pathologists–accredited, Clinical Laboratory Improvements Act (CLIA)–certified reference laboratory, where urine samples from prospective clinical trials will be able to address the clinical utility of this assay for generalized immune risk serial screening after transplantation.

In conclusion, the Q score offers the potential to be used as an immune monitoring tool to guide the use of immunosuppression, with the ultimate goal of controlling subclinical intragraft inflammation and thus prolonging graft survival. Further assessment of the clinical utility of this assay in prospective longitudinal renal transplant studies in both sensitized and unsensitized renal allograft recipients would help evaluate the potential of the assay to substitute for protocol renal transplant biopsies for the determination of clinical and subclinical rejection. Such studies would allow for early detection and proactive management of graft rejection for the improvement of both patient morbidity and graft survival.

MATERIALS AND METHODS

Study design

This was a batched analysis of urine samples prospectively collected between 2010 and 2018 from adult (18 to 65 years of age) and pediatric (1 to 18 years of age) kidney transplant recipients who had transplant surgeries at the Stanford University Medical Center, USA, the University of California at San Francisco (UCSF), USA, or the Instituto Nacional de Ciencias Medicas y Nutricion, Mexico. All urine samples that were biopsy-matched were subsequently selected for this study. Power calculation analyses determined that a sample of 8 from the positive group and 15 from the negative group would have 90% power to detect a difference of 0.28 between the AUC under the null hypothesis of 0.7 (using eGFR as the comparator in this dataset) and an AUC under the alternative hypothesis of 0.98 using a two-sided z test at a significance level of 0.05.

The studies were approved by the institutional review boards of all three institutions. All patients or their legal guardians provided written informed consent to participate in the research, in full adherence to the Declaration of Helsinki. The clinical and research activities being reported are consistent with the Principles of the Declaration of Istanbul as outlined in the Declaration of Istanbul on Organ Trafficking and Transplant Tourism.

Study populations and samples

Male and female, adult and pediatric, patients receiving a kidney from related, unrelated living donors or unrelated deceased donors were included. All patients were on an immunosuppressive regimen consisting of calcineurin inhibitors, mycophenolate mofetil, and steroids. Urine samples were collected before the performance of an indicated or protocol biopsy (at various prespecified time intervals for each study protocol) and were processed and stored in an existing biorepository before batch analysis. The selection of study samples was based on adequate urine volume being available and association of the sample with a paired biopsy with sufficient biopsy material for histological information. Healthy control urine samples were collected from volunteers who had good health, normal serum creatinine (SCr), no proteinuria, and no identifiable CKD risk factors. For details of patient demographics, see Table 1. The study was divided into different cohorts (Fig. 1). In the training set, the original six biomarkers from the study by Watson et al. (15), which developed a Kidney Injury Test score to noninvasively assess patients with CKD, were assessed in biopsy-paired urine samples obtained from kidney transplant recipients, with and without kidney transplant rejection. The score was applied to this dataset to assess renal transplant quiescence versus rejection and called the QiSant score or Q score. The optimal threshold for the Q score was determined in the training set to evaluate the score’s performance for transplant rejection diagnosis. The locked Q score was next applied to an independent validation set of samples with stable or AR phenotypes and subsequently to independent samples from a prediction set.

Urine sample collection and processing

Urine samples were collected midstream in sterile containers and centrifuged at 2000g at 4°C for 30 min within 1 hour of collection. The supernatant was aliquoted, and pH was adjusted to 7.0 by adding 1 part 1 M tris-HCl to 10 parts urine supernatant. The urine was stored at −80°C in the Sarwal Lab Biorepository until further use. Samples were processed for the QiSant Assay biomarkers as described below. There were 364 unique urine samples from renal allograft recipients, with 332 cross-sectional samples, each of which was paired with a closed 18-gauge needle allograft biopsy. Thirty-two urine samples collected at protocol time points with paired STA biopsies were from patients who had stable graft function and were collected proximal to a biopsy-confirmed rejection episode.

Biopsy samples

All kidney biopsies were analyzed in a blinded manner by the local study pathologist at each site, and all biopsies were also reevaluated by a central UCSF pathologist (C.C.-O.) and were graded by the Banff classification for AR (17); intragraft C4d stains were performed to assess for ABMR. AR was defined, at minimum, by the following criteria: (1) TCMR consisting of either a tubulitis (t) score > 2 accompanied by an interstitial inflammation (i) score > 2 or vascular changes (v) score > 0; (2) C4d-positive ABMR consisting of positive DSAs with a glomerulitis (g) score > 0 or peritubular capillaritis (ptc) score > 0 or v > 0 with unexplained acute tubular necrosis/thrombotic microangiopathy (ATN/TMA) with C4d = 2; or (3) C4d-negative ABMR consisting of positive DSA with unexplained ATN/TMA with g + ptc ≥ 2 and C4d = 0 or 1. Normal or stable (STA) allografts were defined by an absence of substantial injury on the matched biopsy pathology and definitions of the inflammation or i score and the tubulitis or t score. Both chronic allograft injury with bAR and BKVN used standard pathology definitions as described by the Banff schema (17) on the paired biopsies from each individual urine sample.

Biomarker selection and measurement in urine samples

On the basis of legacy urine genomic, metabolomic, and proteomic studies conducted by our group (3336), we selected biomarkers in urine that reflected kidney injury from different renal subcompartments as previously described (15). Briefly, cfDNA was selected because it reflects the total apoptotic burden of kidney injury (10, 37), and the methylated fraction was selected to further augment detection of renal parenchymal injury (38). Total protein was selected as a late marker of glomerular injury (39, 40). CXCL10 was selected as an established marker of both renal and transplant inflammation (14, 41, 42). Clusterin was selected as a marker of tubular injury (43, 44). Urine creatinine was used for data normalization and control for hydration status and diurnal variation (45, 46).

For microwell-based measurement of cfDNA, we developed a 5′ biotinylated oligonucleotide complementary chemiluminescent immunoprobe for the measurement of specific target cfDNA fragments, as previously described (15). Streptavidin–horseradish peroxidase (R&D Systems) and SuperSignal ELISA Femto Substrate (Thermo Fisher Scientific) were used for luminescent detection and quantitation. CXCL10, clusterin, and m-cfDNA were measured using custom-developed assays (NephroSant Inc.), the reagents for which are available for use by the scientific community for noncommercial purposes, under a material transfer agreement with the Regents, University of California. Total protein was measured using the Pierce Coomassie Plus (Bradford) Assay Kit (Thermo Fisher Scientific). Urinary creatinine was used to normalize the five other biomarkers for urinary concentration and hydration status and was measured using the Creatinine Assay Kit (BioAssay Systems). Microwell plate readings were measured using an iD3 Multi-Mode microplate reader (Molecular Devices). All assays were run in duplicate.

Statistical analysis

Comparisons involving two groups were performed using the nonparametric Mann-Whitney U test. Comparisons involving three groups were performed using the nonparametric Kruskal-Wallis test with Dunn’s post hoc multiple comparisons correction. Correlations were determined using the Pearson correlation coefficient. To further refine the score for detection of biopsy-confirmed AR, we optimized the assay and the Q score for rejection diagnosis. For this purpose, we used 364 biopsy-matched kidney transplant samples from three transplant centers, with pediatric and adult allograft recipients, and performed random sampling to split these cohorts with clear outcomes of no rejection or AR into a training set and two independent validation sets (Table 1). The Q score was determined using a Bootstrap Random Forest ensemble model. All analyses were performed using either GraphPad Prism 8.3.0 (GraphPad) or JMP 14.3 (SAS Institute).

SUPPLEMENTARY MATERIALS

stm.sciencemag.org/cgi/content/full/12/535/eaba2501/DC1

Fig. S1. Correlation matrix of biomarkers.

Table S1. Multivariate logistic regression of AR status as assessed by individual biomarkers.

Table S2. Biomarker log-likelihood ratios for the urine score.

Table S3. Multivariate regression of clinical and demographic variables with the urine score.

Data file S1. Urine score data.

REFERENCES AND NOTES

Acknowledgments: We are grateful for the help from physicians, clinical coordinators, research personnel, patients, and patient families. We thank E. Evangelou, R. Mani, T. Whitson, L. Lu, T. Schmeckpeper, and M. Nasr for their contributions to this paper. Funding: This work was supported by startup funds for the Sarwal Lab from the Department of Surgery, UCSF to M.M.S. Additional funding support for the study was provided by the U19 1U19AI128913 and R01 DK109720-02 grants to M.M.S. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Author contributions: Conceptualization: J.Y.C.Y., T.K.S., and M.M.S.; methodology: J.Y.C.Y., T.K.S., and M.M.S.; software: J.Y.C.Y.; validation: J.Y.C.Y. and M.M.S.; formal analysis: J.Y.C.Y. and M.M.S.; investigation: J.Y.C.Y., R.D.S., J.M.A.-G., J.M.L., and I.D.; resources: J.A., F.V., and M.M.S.; data curation: J.Y.C.Y., R.D.S., B.R., C.C.-O., and M.M.S.; writing—original draft preparation: J.Y.C.Y., R.D.S., and M.M.S.; writing—review and editing: R.D.S., J.Y.C.Y., and M.M.S.; visualization: J.Y.C.Y. and M.M.S.; supervision: T.K.S. and M.M.S.; project administration: T.K.S., F.V., J.A., and M.M.S.; funding acquisition: T.K.S. and M.M.S. Competing interests: The IP for this assay has been filed under WO patent WO 2018/035340 A1 titled “A novel immunoprobe-based method to assess organ injury status through a biofluid-based cfDNA assay” and is owned by the Regents, the University of California. M.M.S., J.Y.C.Y., and T.K.S. are founders of NephroSant Inc. (San Francisco, CA), which has licensed the IP for this assay from the Regents, the University of California. M.M.S. is on the FDA Science Board and consults or has recently consulted or received sponsored research funds from Bristol-Myers Squibb, Natera, Astellas Pharma, Genentech, and Jazz Pharmaceuticals. F.V. consults for eGenesis Bio, Natera, and Sanofi. J.Y.C.Y. and R.D.S. are consultants for NephroSant. The other authors declare that they have no competing interests. Data and materials availability: All data associated with this study are present in the paper or the Supplementary Materials. Collaborative studies on new patient samples and cohorts with the scientific community for noncommercial purposes are encouraged and will be facilitated by individual material transfer agreements with KIT Bio, doing business as NephroSant Inc.

Stay Connected to Science Translational Medicine

Navigate This Article