Urine NGAL as a biomarker for septic AKI: a critical appraisal of clinical utility—data from the observational FINNAKI study

Background Neutrophil gelatinase-associated lipocalin (NGAL) is released from kidney tubular cells under stress as well as from neutrophils during inflammation. It has been suggested as a biomarker for acute kidney injury (AKI) in critically ill patients with sepsis. To evaluate clinical usefulness of urine NGAL (uNGAL), we post-hoc applied recently introduced statistical methods to a sub-cohort of septic patients from the prospective observational Finnish Acute Kidney Injury (FINNAKI) study. Accordingly, in 484 adult intensive care unit patients with sepsis by Sepsis-3 criteria, we calculated areas under the receiver operating characteristic curves (AUCs) for the first available uNGAL to assess discrimination for four outcomes: AKI defined by Kidney Disease: Improving Global Outcomes (KDIGO) criteria, severe (KDIGO 2–3) AKI, and renal replacement therapy (RRT) during the first 3 days of intensive care, and mortality at day 90. We constructed clinical prediction models for the outcomes and used risk assessment plots and decision curve analysis with predefined threshold probabilities to test whether adding uNGAL to the models improved reclassification or decision making in clinical practice. Results Incidences of AKI, severe AKI, RRT, and mortality were 44.8% (217/484), 27.7% (134/484), 9.5% (46/484), and 28.1% (136/484). Corresponding AUCs for uNGAL were 0.690, 0.728, 0.769, and 0.600. Adding uNGAL to the clinical prediction models improved discrimination of AKI, severe AKI, and RRT. However, the net benefits for the new models were only 1.4% (severe AKI and RRT) to 2.5% (AKI), and the number of patients needed to be tested per one extra true-positive varied from 40 (AKI) to 74 (RRT) at the predefined threshold probabilities. Conclusions The results of the recommended new statistical methods do not support the use of uNGAL in critically ill septic patients to predict AKI or clinical outcomes.


Background
Neutrophil gelatinase-associated lipocalin (NGAL) has been studied extensively as a biomarker for detection and evolution of acute kidney injury (AKI) as well as outcome [1,2]. NGAL is a protein first found in neutrophil granules [3], but synthesized in numerous human tissues in addition to kidney epithelium-e.g., respiratory tract, stomach, and colon. All in vivo functions of NGAL are not plausibly unraveled. It increases rapidly in serum and urine not only in conjunction with renal tubular injury, but also in bacterial infections, non-infectious systemic inflammatory response syndrome, and chronic and systemic diseases without bacterial infection [4]. Consequently, inflammation is considered a confounding factor hindering the routine use of NGAL as a biomarker of AKI in intensive care patients with sepsis [5][6][7]. In a recent meta-analysis, urine NGAL (uNGAL) predicted septic AKI with an area under the receiver operating characteristic curve (AUC) of 0.90 [8], but the individual studies were rather small, the sample sizes varying between 45 and 168. Besides, generalizability of the meta-analysis may be questioned since 65% of the sepsis patients were from Asia. Furthermore, currently used statistical methods have several shortcomings: AUCs are not very suitable for evaluating the incremental value of biomarkers [9] or assessing clinical usefulness [10]. Newer reclassification methods may even make useless biomarkers appear applicable [11]. Although there is obvious need for better tools than urine output and serum creatinine for early detection and classification of AKI, the existing data on any kidney injury biomarker for AKI diagnosis, staging, prognosis, or treatment are inadequate [12].

Open Access
We have previously tested the ability of uNGAL to predict AKI, renal replacement therapy (RRT), and 90-day mortality in a large non-selected cohort of 1042 adult intensive care patients [13]. Patients with sepsis comprised 46% of the study population. In comparison to the previous meta-analysis [8], this is by far the largest cohort of septic patients with uNGAL measurements. Since we did not report the septic patients separately, they could not be included in the meta-analysis [8]. We now extended our analyses to evaluate the usefulness of uNGAL in predicting AKI, RRT, and 90-day mortality in septic patients using more sophisticated statistical methods: risk assessment plot (RAP) [14] and decision curve analysis (DCA) [10]. Accordingly, we tested the hypothesis that uNGAL improves the performance of clinical risk models for AKI, RRT, and 90-day mortality in a homogeneous and clinically important group of critically ill septic patients using these new statistical methods. We are not aware of a similar detailed analysis of uNGAL or its clinical usefulness in the literature.

Patients
We analyzed the urine of septic patients of this FINNAKI NGAL-substudy [13]. The Ethics Committee of the Department of Surgery in Helsinki University Hospital gave a nationwide approval for the FINNAKI study [15] with a deferred consent policy, confirmed by a written consent from each patient or his/her proxy.

Data
The patients of the original study [13] were prospectively screened for sepsis defined by the American College of Chest Physicians/Society of Critical Care Medicine (ACCP/SCCM) criteria [16]. To increase the generalizability of the results, we now reclassified these patients using the recent Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) criteria [17]. We included patients fulfilling the criteria during the first 3 days of intensive care.
We defined AKI according to Kidney Disease: Improving Global Outcomes (KDIGO) criteria including both plasma creatinine and urine output criteria [18]. Urine output was measured hourly and plasma creatinine daily. The last available plasma creatinine value from the preceding year up to 1 week before intensive care unit (ICU) admission was used as the baseline value. When not available, we estimated the baseline creatinine value using the Modification of Diet in Renal Disease equation assuming a glomerular filtration rate of 75 ml/ min/1.73 m 2 [19]. We included data on AKI and RRT during the first 3 days of ICU stay, choosing the highest stage for the final KDIGO stage of each patient for the analyses. We obtained 90-day mortality data from the Finnish Population Register Centre.

Measurement of uNGAL
The urine samples were collected on ICU admission (0 h), at 12 and 24 h, and stored as described elsewhere [13]. The person who analyzed the samples in duplicate with NGAL ELISA Rapid Kit (BioPorto ® Gentofte, Denmark) was blinded to patient records. The validated enzymelinked immunosorbent assay (ELISA) method has a good intra-and inter-assay precision [20] and a measurement range of 10-1000 ng/ml. For the analyses, values below 10 ng/ml were registered as the lowest measurable value (10 ng/ml) and values above 1000 ng/ml as the highest measurable value (1000 ng/ml). For predictive calculations, we chose the first available uNGAL measurement (0, 12 or 24 h) for each patient.

Statistical analyses
We tested four different outcomes: (1) AKI by original KDIGO classification (KDIGO stages 1-3), (2) "severe" AKI (KDIGO stages 2-3), (3) RRT, and (4) 90-day mortality. To simulate clinical decision making, we constructed clinical risk models for these outcomes using clinical variables available at the time of ICU admission. We tested associations of these variables with the outcomes using Mann-Whitney U, Chi square or Fisher's exact test (with a two-sided p value), as appropriate. We conducted multivariable logistic regression analyses entering variables with the strongest associations (shown in Additional file 1: Table S1) simultaneously. We restricted the number of covariates to 1 per 8 dependent endpoints to avoid overfitting [21] and imputed missing values (Additional file 1: Table S1) as recommended [22]. To ensure that the assumptions for multivariable logistic regression were met, we checked the correlations between the variables and conducted multiple regression analysis to rule out multicollinearity (Additional file 2). We used Hosmer-Lemeshow test to evaluate model goodness of fit. Thereafter, we added uNGAL to the clinical risk models and gained new risk models for the four outcomes (Additional file 3). We calculated AUCs with 95% confidence intervals (CIs) for uNGAL alone, for the clinical risk models, and for the new risk models including uNGAL. To evaluate the predictive value of uNGAL, we calculated category-free net reclassification improvement (cfNRI) [23] and integrated discrimination improvement (IDI) [24], and draw RAPs [14] for each outcome. We describe these metrics in detail in Additional file 4: Statistical methods.
We also conducted DCAs [10] for the outcomes to illustrate the net benefit of adding uNGAL to the clinical prediction models. DCA plots net benefit against threshold probability. Net benefit delineates gained new truepositive results without false-positive results and varies according to the chosen threshold probability, that is, the probability above which the patient is offered treatment (e.g., ICU admission). For example, with threshold probabilities of 0, 1, or 0.1, we would admit all patients, none of the patients, or those having a risk of ≥ 10%, respectively. A threshold probability is chosen according to the significance of false-negative versus false-positive results. Threshold probability of 0.1 means that we consider the harm of a false-negative result (denial of necessary ICU admission) 9 times (1-0.1/0.1) worse than a false-positive result (unnecessary ICU admission). For more serious outcomes, false-negative results are considered more harmful and the threshold lowered. Accordingly, for AKI, severe AKI, RRT, and 90-day mortality, we prospectively chose threshold probabilities of 0.3, 0.2, 0.1, and 0.05, respectively. We calculated test trade-offs to determine the minimum number of patients to be tested per one extra true-positive classification [25]. Finally, we performed a sensitivity analysis excluding patients that did not have 0-h urine sample.
We present the data as medians with interquartile ranges (IQRs) or absolute numbers (percentage with 95% CIs). Statistical analyses were conducted using SPSS

AUC
The AUCs for uNGAL predicting AKI, severe AKI, and RRT during the 3 first days in ICU, and death by day 90 are shown in Table 2. Adding uNGAL to the clinical risk model yielded statistically significant model improvement for the outcomes AKI, severe AKI, and RRT (p < 0.05 for all), but not for 90-day mortality (Table 2).

cfNRI, IDI, and RAP
The combined cfNRI and IDI values indicate that the models changed to the right direction when uNGAL was added. The model improved most prominently for RRT (Table 2, Fig. 2c). RAPs showed variable effects with AKI and severe AKI (Fig. 2 a, b), a more sustained effect with RRT (Fig. 2c), but negligible improvement in 90-day mortality prediction (Fig. 2d).

Decision curve analysis (DCA) and net benefit
At the pre-defined threshold probability of 0.3, there was a 2.5% (95% CI 0.2-4.6%) net benefit of adding uNGAL to the clinical AKI risk model (Fig. 3a). For severe AKI (threshold probability of 0.2) and RRT (threshold probability of 0.1), net benefits were 1.4% (0.4-4.1%, Fig. 3b) and 1.4% (0.1-2.8%, Fig. 3c), respectively. 90-day mortality prediction did not improve at threshold probability of 0.05 (Fig. 3d). The test trade-offs (minimum patient numbers to be tested for one extra true-positive, reciprocal of net benefit) were 40 for AKI, 71 for severe AKI, and 74 for RRT. Repeating the analyses excluding those 24 patients who did not have the 0-h sample did not change the results (see Additional file 5).

Discussion
In this extended statistical analysis comprising 484 critically ill septic patients, we found that uNGAL alone predicts AKI poorly-not better than a clinical prediction model using data on ICU admission. Adding uNGAL to the clinical prediction model improved the prediction of AKI, severe AKI, and RRT slightly, but the magnitude of the improvement is not clinically meaningful. These findings profoundly contradict the conclusions of the current meta-analysis and do not support the usefulness of uNGAL in critically ill septic patients.

AUCs for uNGAL
In this study of sepsis patients, the first available uNGAL obtained in the ICU presented an AUC of 0.690 for AKI, resembling the AUC of 0.733 for the highest uNGAL of the first 24 h in 1042 non-selected ICU patients from the FINNAKI study [13]. Our result is in disagreement with the pooled AUC (0.90) of the 12 studies included in a recent meta-analysis evaluating performance of NGAL in septic patients [8]. Importantly, the individual studies in the meta-analysis by Zhang et al. [8] were smallthe largest one enrolled 168 patients. Generalizability of the meta-analysis may be questioned since nearly twothirds of a total of 1263 included study patients were from one country, China. Besides, two of the included studies (228 patients) were conducted in the emergency department-not in the ICU. Disease severity and prevalence of comorbidities like chronic kidney disease in the study population may differ from ours. According to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) criteria, the risk of bias was not dealt properly in the majority of the individual studies [8]. Of the included studies, our results were comparable to the Danish study by Hjortrup and colleagues [26], which may result from similarities in case mix and care representing current clinical practice in high-income countries. In the present sepsis cohort, corresponding the original FIN-NAKI NGAL-substudy [13], the point estimate AUC for uNGAL seemed to be best for prediction of RRT (AUC 0.769). However, according to our results, uNGAL alone is not adequate to predict AKI, RRT, or 90-day mortality in septic patients.

Improvement of the clinical prediction model
Based on IDI values, adding uNGAL to clinical reference models resulted in improved prediction of the outcomes. IDI and cfNRI values are somewhat difficult to interpret, but RAPs illustrate these metrics in patients with and without the event revealing model improvement or worsening across different risks of the event. The curves showed that model improvement varied depending on the risk of the event.

Clinical usefulness of uNGAL
The widely used and easily interpreted AUC plots the true-positive rate (sensitivity) against the false-positive rate (1 -specificity) giving us consecutive cut-offs for the predicted risk. To guide decision making in clinical practice, a cut-off for a biomarker (or a decision threshold) is needed to divide patients to low-and high-risk groups, but such a value is not always reported in biomarker studies [8]. AUC enables comparison of the overall performance of different tests for the same condition but does not tell us the ability of a marker to add value to a pre-existing risk prediction model or, if such is lacking, to clinical judgment.
We used DCA graph [10], which illustrates the net benefit of a model in a range of different threshold probabilities of the event. DCA enables the comparison of the clinical and the new model including NGAL. DCA graphs, especially of AKI and severe AKI, show that the increase of net benefit after adding uNGAL to the clinical risk model varies over the range of clinically relevant threshold probabilities. The clinician chooses the optimal threshold probability, balancing between the harm of a false-positive and a false-negative classification. We chose decreasing threshold along with increasing severity of the event. Test trade-offs between 40 (for AKI) and 74 (for RRT), indicating minimum patient numbers to be tested for one extra truepositive, are hardly acceptable as no specific preventive or curative treatment for AKI exists and the criteria for RRT initiation are still under investigation [27].

Limitations and strengths
Some obvious limitations of the present study need to be considered. First, as this was a post-hoc analysis of a subgroup of septic patients from a larger FINNAKI NGAL study [13], we were not able to influence sample size. Thus our results, especially the predictive value of uNGAL for RRT, must be interpreted with caution due to small number of events. Furthermore, varying clinical practice in the use of RRT in different countries diminishes the generalizability of our findings. However, to the best of our knowledge, this is the largest multicenter cohort of consecutive intensive care patients studying uNGAL in sepsis. Second, although the study patients were originally screened using the former ACCP/SCCM criteria for sepsis, we now included only those with sepsis according to the recent Sepsis-3 definition. This may have led to exclusion of patients fulfilling Sepsis-3 criteria but not the former criteria. Third, some patients may have had existing AKI already at the time of measurement, a problem in all predictive biomarker studies. We performed a sensitivity analysis excluding those 24 patients that did not have a 0-h urine sample but this did not change the results. Fourth, we did not normalize uNGAL for urinary creatinine [28]. Finally, the purpose of the presented clinical risk models was to enable evaluation of the incremental value of uNGAL, that is, what uNGAL adds on clinical reasoning. Importantly, no model should be used in clinical practice before independent external validation. Even though the results of Hosmer-Lemeshow goodness of fit test and restricting the number of variables in the models did not support overfitting, it cannot be ruled out. Our study has also some strength. To simulate clinical applicability and to perform a fair comparison, we limited the variables for the clinical risk models to those available on ICU admission. We performed an extended statistical analysis to scrutinize clinical usefulness of a suggested AKI biomarker using the most recently proposed statistical methods including variable weighing of false-negatives and false-positives as recommended [25,29,30]. We consider our detailed analysis of 484 patients adds significantly on existing combined uNGAL data from 1263 septic patients [8].

Conclusions
We conclude that in critically ill adult sepsis patients, the performance of uNGAL alone was inadequate in predicting AKI, RRT, and 90-day mortality. The detailed statistical analyses do not support the clinical usefulness of uNGAL in this patient population. The gray areas between the solid and the dashed lines represent IDI events (area between black lines) and IDI nonevents (area between red lines). a Visually estimated from the curves, adding uNGAL to the clinical risk model improves separation of patients who will develop AKI when the risk of the event is more than ≈ 45%, and discrimination of patients who will not develop AKI when the risk of the event is less than ≈ 50%. b With severe AKI, uNGAL added to the clinical risk model improves distinguishing KDIGO 2-3 patients when the risk of the event (= severe AKI) is more than ≈ 25% and helps separating those with KDIGO stage 0-1 when the risk of the event is less than ≈ 30%. c Adding uNGAL to the clinical risk model improves the performance for assigning individuals that will end up with RRT when the risk of the event is lower than ≈ 40%, and enhances discrimination of those not ending up with RRT when the risk of the event is lower than ≈ 10%. d Corresponding statistics in Table 2, RAPs for the clinical 90-day mortality risk model and for the new model with uNGAL added illustrate that uNGAL offers only minimal enhancement separating those who will die by day 90 when the risk of the event is > 40%