This study was a prospectively defined analysis of a registry-based data validation cohort, gathered from consecutively admitted patients to a surgical ICU of a tertiary university hospital in Brazil, from January 1, 2013, to December 31, 2016. Our electronic database is continuously fed with predefined clinical and laboratory information from every patient admitted to our surgical ICU. Patients were followed daily during their ICU stay and then tracked for their final hospital status as discharged or deceased. The target condition of interest was the death of any cause in ICU or hospital. Variables, coefficients, and equations used for the index tests (SOFA, APACHE II, and SAPS 3) calculations were based on original publications without any adjustment or updating and are available upon request [4,5,6, 8]. APACHE II, SAPS 3 and SOFA scores were calculated after the first day of ICU admission using data collected at the prespecified time frame. This study was a registry-based data analysis with outcomes and predictors available before the beginning of any form of statistical analysis. Therefore, the blindness of outcomes or predictors was not employed. We followed the standards for reporting diagnostic accuracy (STARD) statement and the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement recommendations for validation studies (Additional file 1: Figure S1) [17, 18].
We did not perform any formal statistical method for sample size calculation and evaluated all patients available in our database for enrollment. However, considering that more than 100 events were observed for intra-ICU mortality and more than 250 events for in-hospital mortality, we believe that our sample size is satisfactory.
Patients eligibility criteria for study enrollment were age 18 or above and admission to surgical ICU for postoperative recovery of an elective or urgent surgical procedure. Patient data were excluded only if the target condition information was missing. Noteworthy, there were no patient’s exclusions after application of eligibility criteria. Our eligibility criteria were restrictive, allowing only surgical patients enrollment. These criteria contrast with original development cohorts of SAPS 3 and APACHE II. The SAPS 3 cohort included the first ICU admission of patients aged 16 or more and excluded data from patients lacking information about any admission or discharge variables. The APACHE II cohort consecutively included ICU-admitted patients for a medical or surgical reason and excluded patients that were missing any admission variable information or submitted to a coronary artery bypass graft surgery. These inclusion criteria are in contrast with our sample that enrolled patients submitted to any surgical procedure and enrolled those who had admission data missing. We handled missing values in predictor variables with multiple imputations. This procedure was performed with SPSS version 22 using a linear regression model. The variables included in the multiple imputation model were intra-ICU and in-hospital mortalities, age, sex, type of surgery, SAPS 3, APACHE II, and SOFA scores. Ten imputed datasets were created, and areas under the receiver operating characteristic curve had their sensitivities and specificities averaged to generate the final curve used in our results.
Our ICU provides a mixed model of care with full-time intensivists, nurses, assistants, respiratory therapists, dietitians, and attending physicians. A minimum standardized level of care was provided, consisting of a daily checklist called ABCD-preV (Additional file 2: Table S1) [19], in order to minimize therapeutic variations inside the population that could change the probability of the outcome and biased the results.
We evaluated the predictive performance of the index tests in a cohort of general surgical patients by estimating their discrimination and calibration. Discrimination reflects the capacity of a prediction model to differentiate between those who do and do not develop the defined target condition during the study period. For the measurement of discrimination, we used the concordance index (C-index) statistic through the calculation of the area under the receiver operating characteristic curve (AUROC) with intra-ICU or in-hospital mortality as the binary endpoints. A value of 0.5 for AUROC signifies chance and means that the predictor in analysis cannot distinguish between a positive or an adverse outcome while a value of 1 represents perfect discrimination. Discrimination was classified according to AUROC values as follows: 0.90–1 excellent, 0.80–0.90 good, 0.70–0.80 fair, 0.60–0.70 poor and 0.50–0.60 fail [20]. The DeLong method was used to compare whether differences between different models AUROC’s were statistically significant [21]. Calibration reflects how well intra-ICU and in-hospital mortalities predicted by each model agree with the observed outcomes. This relation was shown graphically by clustering patients in tenths of predicted risk according to each model and plotting the expected against the observed number of cases. A smoothed line was drawn over the entire predicted probability range to augment the observed correlation. A well-calibrated model predicts over a line slope around 45°. The calibration plot also indicates the magnitude and direction of the model’s miscalibrations. For statistical analysis of the model’s predictive performance, we employed the Hosmer–Lemeshow goodness-of-fit test [22]. In an adequate sample size, results with p values higher than 0.05 indicate a good agreement between the model’s predicted probabilities and observed outcome rates.
Median follow-up was calculated for intra-ICU and in-hospital periods according to the reverse Kaplan–Meier survival function that uses the event indicator reversed and censoring becomes the outcome of interest.
A decision curve analysis was developed to describe and compare the clinical utility of tested models. Logistic regression was used to convert the model’s calculated values into predicted probabilities of death. Patients were defined as high risk if their intra-ICU or in-hospital mortality probabilities were higher than the prognostic model set probability threshold. Net benefit for different threshold values of each model was calculated according to Vickers et al. and compared to the possible clinical strategy of considering that all patients were positive for the outcome and treated them all and that all patients were negative for the outcome and received no treatment [13, 14].
Statistical analyses were performed using MedCalc version 18 and SPSS version 22. Continuous variables were reported as a mean and standard deviation or median and interquartile ranges whether they follow a normal distribution or not. Categorical variables were presented as count and proportion. Univariate analysis was performed using appropriated tests for continuous and categorical variables to assess association with mortality. Relative risks for mortalities were calculated after adjustment for illness severity. This procedure was performed using a case-control matching strategy with severity scores (SOFA, SAPS 3, and APACHE II) as specific criteria. A two-tailed p value of less than 0.05 was considered statistically significant.