Early prediction of noninvasive ventilation failure in COPD patients: derivation, internal validation, and external validation of a simple risk score

Background Early identification of noninvasive ventilation (NIV) failure is a promising strategy for reducing mortality in chronic obstructive pulmonary disease (COPD) patients. However, a risk-scoring system is lacking. Methods To develop a scale to predict NIV failure, 500 COPD patients were enrolled in a derivation cohort. Heart rate, acidosis (assessed by pH), consciousness (assessed by Glasgow coma score), oxygenation, and respiratory rate (HACOR) were entered into the scoring system. Another two groups of 323 and 395 patients were enrolled to internally and externally validate the scale, respectively. NIV failure was defined as intubation or death during NIV. Results Using HACOR score collected at 1–2 h of NIV to predict NIV failure, the area under the receiver operating characteristic curves (AUC) was 0.90, 0.89, and 0.71 for the derivation, internal-validation, and external-validation cohorts, respectively. For the prediction of early NIV failure in these three cohorts, the AUC was 0.91, 0.96, and 0.83, respectively. In all patients with HACOR score > 5, the NIV failure rate was 50.2%. In these patients, early intubation (< 48 h) was associated with decreased hospital mortality (unadjusted odds ratio = 0.15, 95% confidence interval 0.05–0.39, p < 0.01). Conclusions HACOR scores exhibited good predictive power for NIV failure in COPD patients, particularly for the prediction of early NIV failure (< 48 h). In high-risk patients, early intubation was associated with decreased hospital mortality.


Background
Noninvasive ventilation (NIV) increases alveolar ventilation and reduces the work of breathing in patients with acute exacerbation of chronic obstructive pulmonary disease (COPD) [1]. Consequently, it reduces respiratory rate, decreases PaCO 2 and improves the level of consciousness [2,3]. However, the failure rate ranges from 15 to 24% in COPD patients [4][5][6]. In contrast to patients who initially receive invasive mechanical ventilation, patients who initially receive NIV but subsequently experience NIV failure and then receive intubation are more likely to die in the hospital [7,8]. And delayed intubation is associated with increased hospital mortality [9][10][11]. Thus, the early identification of patients who cannot benefit from NIV is important.
Previous studies have reported that many variables can predict NIV failure in COPD patients [5,[12][13][14][15][16], including disease severity, heart rate, respiratory rate, consciousness, and arterial blood pH. However, no single variable can predict NIV failure well. A combination of several variables may increase predictive accuracy. Thus, we developed a simple score using several variables that are easily obtained at bedside to assess the efficacy of NIV in COPD patients.

Methods
This study was performed at three hospitals (the First Affiliated Hospital of Chongqing Medical University, the First Affiliated Hospital of Xi'an Medical University, and the People's Hospital of Changshou). At the First Affiliated Hospital of Chongqing Medical University, data were extracted from a prospectively collected database which was collected from June 2011 to June 2018. We randomly selected 500 patients to the derivation cohort and the rest 323 patients to the internal-validation cohort. At the First Affiliated Hospital of Xi'an Medical University and the People's Hospital of Changshou, data were retrospectively collected from January 2015 to June 2018 and March 2013 to June 2018, respectively. These data were from a total of 395 patients, who were enrolled in an external-validation cohort.
The ethics committee approved the study protocol. Informed consent was obtained from patients or their next of kin at the First Affiliated Hospital of Chongqing Medical University. As data were retrospectively collected at the other two hospitals, informed consent was waived. We enrolled COPD patients admitted to the respiratory ICU for NIV as follows: arterial blood pH less than 7.35, PaCO 2 more than 45 mmHg, and presence of dyspnea at rest assessed using accessory respiratory muscles or paradoxical abdominal breathing. However, patients who required emergency intubation or showed NIV intolerance were excluded. NIV intolerance was defined as refusal of NIV because of discomfort [17].
Patients were managed by the attending physicians, respiratory therapists, and nurses. The NIV was managed according to the protocols of each hospital. The face mask was the first choice for the interface to connect the ventilator to the patient. The size of the face mask was fitted to the face type. Bi-level positive airway pressure was used for the ventilation mode. The expiratory positive airway pressure was initially set at 4 cmH 2 O, and it was titrated to diminish the ineffective trigger. The inspiratory positive airway pressure was initially set at 8 cmH 2 O and then gradually increased to achieve the best control of dyspnea or to the tolerance of the patient. The fractional concentration of oxygen was selected to maintain the bedside oximeter (SpO 2 ) above 90% and the PaO 2 above 60 mmHg.
The efficacy of NIV was assessed during NIV intervention. The arterial blood pH, PaO 2 , PaCO 2 , PaO 2 /FiO 2 , GCS, respiratory rate, and heart rate were recorded. If respiratory failure was relieved, liberation from NIV was attempted. However, if the conditions of respiratory failure became progressively aggravated and reached the criteria for invasive mechanical ventilation, intubation was performed. The criteria for intubation were persistent respiratory distress with respiratory rate more than 35 breaths/min, failure to maintain a PaO 2 /FiO 2 above 100 mmHg, inability to correct respiratory acidosis, development of conditions necessitating intubation to protect the airway (coma or seizure disorders) or to manage copious tracheal secretions, hemodynamic instability without response to fluids and vasoactive agents, and respiratory or cardiac arrest. Nevertheless, intubation was left to physician discretion. NIV failure was defined as intubation or death during NIV [5]. Where NIV failure occurred within 48 h of NIV, it was defined as early NIV failure [18]. Where failure occurred after 48 h of NIV, it was defined as late NIV failure.

Statistical analysis
Data for 24 patients (2%) were unavailable for at least one time point. No imputation for missing data was performed. Data were analyzed using statistical software (SPSS 17.0; IBM Corp., Armonk, NY). Qualitative and categorical variables are reported as numbers and percentages, and the differences among groups were analyzed using the Chi-square and/or Fisher's exact tests. The differences between pairs of groups were analyzed using the unpaired Student's t-test or the Mann-Whitney U test, where appropriate. The differences among the three groups were analyzed using one-way ANOVA or the Kruskal-Wallis H test, where appropriate. The diagnostic accuracy of NIV failure was analyzed using area under the receiver operating characteristic curves (AUC). The cutoff value was based on a positive likelihood ratio of NIV failure of > 5 [19]. Differences that exhibited p values less than 0.05 were considered statistically significant.
The risk-scoring system was developed through the following steps. First, univariate analyses were used to identify variables associated with NIV failure collected at initiation and 1-2 h of NIV in the derivation cohort. Second, variables with p values less than 0.1 in univariate analyses were entered in stepwise multivariate logistic regression analyses to identify the independent risk factors associated with NIV failure. However, we omitted the heart rate collected before NIV as it was strongly correlated with that collected at 1-2 h of NIV and the ability to predict NIV failure was lower than that collected at 1-2 h of NIV [20]. At this step, a regression model was obtained to predict NIV failure. We evaluated this model for goodness of fit using the Hosmer-Lemeshow test (p > 0.05). Third, the risk-scoring system was created, following the method suggested by Sullivan et al. [21]. The variables in the final model were classified into clinically meaningful categories, and the median for each category was recorded. For each variable, a category bearing the lowest risk for NIV failure was set as the within-group reference, and it was assigned to the zero point. Each category was weighted (the difference between reference category multiplied by the β regression coefficient per unit increase). Then, one point was assigned to the category with the lowest weight, and this weight was set as the between-group reference. Finally, we calculated the value of that weight for the other category, divided by the between-group reference. This value was rounded off to the nearest integer as to create the assigned points. The sum of the points was the risk score.

Results
The NIV failure rate was 18.8%, 18.9% and 8.9% in derivation, internal-validation and external-validation cohorts, respectively (Additional file 1: Figure S1). The demographics of each cohort are summarized in Table 1. In the derivation cohort, we found that 14 variables collected at initiation and 1-2 h of NIV were associated with NIV failure in univariate analyses (Table 2). However, the heart rate, acidosis (assessed by pH), consciousness (assessed using the GCS), oxygenation, and respiratory rate were each independently associated with NIV failure. These five variables were used to develop a risk-scoring system to predict NIV failure. Following the weights for each variable, we assigned 3 points to heart rate, 8 points to acidosis, 11 points to consciousness, 2 points to oxygenation, and 3 points to respiratory rate (Table 3). We called the result the HACOR score, on a scale of a total of 27 points.
In the derivation, internal-validation, and externalvalidation cohorts, HACOR scores were much lower in the NIV-success patients than the NIV-failure ones at initiation, 1-2 h, 12 h, and 24 h of NIV (Fig. 1). Higher HACOR scores were associated with increased NIV failure (Fig. 2).
At 1-2 h of NIV, the AUC for the prediction of NIV failure was 0.90 and 0.89 in the derivation and internalvalidation cohorts, respectively (Table 4). Based on timeframe of admission, we divided the patients into four groups to internally validate the HACOR score once again (Additional file 2: Table S1). The AUC for the prediction of NIV failure was 0.87, 0.89, 0.93, and 0.86 in patients enrolled from 2011 to 2012, 2013 to 2014, 2015 to 2016, and 2017 to 2018, respectively. In external-validation cohort, the AUC for prediction of NIV failure was 0.71 (Table 4). For the prediction of early NIV failure, the AUC was 0.91, 0.96, and 0.83 in the derivation, internalvalidation, and external-validation cohorts, respectively. Comparisons in AUC among derivation, internal-validation, and external-validation cohorts are summarized in Additional file 3: Table S2. Comparison in AUC between HACOR score and risk chart developed by Confalonieri et al. [5] to predict NIV failure in external-validation cohort is summarized in Additional file 4: Figure S2.
A cut-off value of 5 was identified to distinguish the high or low risk of NIV failure assessed by HACOR score at 1-2 h of NIV (Table 4, Additional file 5: Table S3 and Additional file 6: Table S4). In patients with HACOR score ≤ 5, the NIV failure rate was 5.8% (Additional file 7: Figure S3). However, in those with HACOR score > 5, the NIV failure rate was 50.2%. At beginning of NIV, there were no differences between early and late intubation in patients at high risk of NIV failure (Table 5). However, early intubation was associated with decreased hospital mortality (unadjusted odds ratio = 0.15, 95% confidence interval 0.05-0.39, p < 0.01).

Discussion
This study developed a novel risk-scoring system, the HACOR score, for the prediction of NIV failure in COPD patients. This scale takes into account heart rate, acidosis, consciousness, oxygenation, and respiratory rate. Because the variables in the HACOR score are easily obtained using simple bedside measurements, it can serve as a rapid and convenient tool for predicting NIV failure. The HACOR score had good diagnostic accuracy for NIV failure when it was assessed at 1-2 h of NIV, and it was even better at predicting early NIV failure. In the high-risk patients identified by HACOR score, early intubation was associated with decreased hospital mortality.
It is advisable to attempt to predict NIV failure in COPD patients. However, multiple factors can cause NIV failure. Using a single variable, it is difficult to predict NIV failure with high accuracy. The combination of several variables can increase predictive power. Confalonieri et al. [5] reported a chart including APACHE II score, GCS, respiratory rate, and pH to predict NIV failure at initiation and 2 h of NIV in COPD patients. This chart had high accuracy in predicting NIV failure. However, the APACHE II score entails a large number of items and cannot be conveniently used to assess outcomes during an NIV intervention. Moreover, serum concentrations of creatinine and white blood cell counts are not always available within 1-2 h of NIV for every patient. Furthermore, no external validation was performed for that chart, and it remains unclear whether it can be extrapolated to other centers. Our scale (HACOR score) is simpler to apply to predict NIV failure in COPD patients. The HACOR score includes only heart rate, acidosis, consciousness, oxygenation, and respiratory rate for its prediction of NIV failure. Because these five variables are easily obtained at bedside, the HACOR score can be rapidly assessed. Furthermore, we performed internal and external validation of its predictive power. This increases the chance that the scale can be successfully extrapolated to other centers.
In COPD patients, NIV failure is highly associated with consciousness [22]. In our study, consciousness was the most relevant variable for predicting NIV failure, with a maximal score of 11 points. Arterial blood pH was the second most relevant variable, with a maximal score of 8 points. Heart rate, respiratory rate, and oxygenation were less relevant. Thus, the HACOR score reflects the different influences of these variables for the prediction of NIV failure. In addition, the sensitivity and specificity for predicting NIV failure were good at 1-2 h of NIV. As the HACOR score can accurately predict NIV failure in COPD patients and can easily be assessed at bedside, it can help clinical    . 1 HACOR scores in patients with NIV success and failure from initiation to 24 h of NIV. Pts patients, NIV noninvasive ventilation, HACOR heart rate, acidosis, consciousness, oxygenation, and respiratory rate practitioners as they make the decision to continue or stop NIV.
The overall mortality was 23%-27% in patients with NIV failure [7,8]. However, the mortality sharply increased (ranged from 50 to 80%) in patients with late NIV failure [12,18,23]. Therefore, identification of NIV failure and intubation earlier is a promising strategy to decrease hospital mortality. In our study, the HACOR score assessed at 1-2 h of NIV can identify the patients at high risk of NIV. Further, we performed a subgroup analysis and found that in these high-risk patients, early intubation was associated with decreased hospital mortality. Thus, the HACOR score is a potential tool for clinical physicians to identify NIV failure earlier. However, the high mortality in late failure group may be resulted from the high risk of mortality on baseline characteristics [18]. The clinical physicians should pay much attention on this issue when they use HACOR score to guide NIV management.
In a previous study, we showed that the HACOR score had high sensitivity and specificity for predicting NIV failure in patients with hypoxemic respiratory failure [19]. Although the variables for predicting NIV failure in that patient population are the same as in the COPD population, the weights for each variable are quite different. For example, although consciousness has the highest weight in both patient populations, oxygenation is the second most relevant variable in hypoxemic patients, and heart rate is the least relevant. These differences can be explained by the physiopathological differences between COPD and hypoxemic patients.
Among the three cohorts, we found an interesting result that patients in external-validation cohort not only had lowest pH and GCS but also had lowest NIV failure. However, they had lower proportion of underlying diseases, lower respiratory rate, lower heart rate, and lower APACHE II scores than those in other two cohorts. It may indicate that the population in external-validation  cohort is different. This may explain that the predictive power in external-validation cohort is not as good as those in derivation and internal-validation cohorts. In addition, the IPAP was similar but EPAP was lower in external-validation cohort. It means that the patients in external-validation cohort received higher support pressure. Thus, the lower respiratory rate, lower heart rate, lower APACHE II score and higher support pressure contribute much to the lower NIV failure in external-validation cohort.
This study has several limitations. First, the internalvalidation cohort was randomly selected. It may reduce the evidence strength. However, we divided the patients collected in the First Affiliated Hospital of Chongqing Medical University into four groups based on timeframe to internally validate the HACOR score once again. It still showed good predictive power of NIV failure in each group. Second, baseline state performance status and need for assistance with activities of daily living have been previously shown to be strongly associated with late NIV  [18]. However, we did not collect these data which may have reduced the accuracy of the HACOR score to predict late NIV failure. Third, we did not predefine how the provision of care was performed in each hospital. The diversity in population and provision of care may be two reasons for different predictive power between the three cohorts. However, diversity is common between different hospitals and different populations. In the external-validation cohort, the predictive power reached moderate to high though it was lower than that in derivation and internal-validation cohorts. Moreover, the external validation only came from two hospitals. The generalizability was limited. Further validation of HACOR score in different settings is needed in the future. Fourth, we found that early intubation was associated with decreased mortality in high-risk patients identified by HACOR score assessed at 1-2 h of NIV. However, this is a secondary analysis. To confirm this result, prospective randomized controlled trial was encouraged.

Conclusions
The HACOR score had high sensitivity and specificity for the prediction of NIV failure in COPD patients, particularly for early NIV failure. Higher scores indicate higher chances of NIV failure. As the variables for the HACOR score are easily obtained at bedside, it is convenient to use to assess the efficacy of NIV in COPD patients. In high-risk patients identified by HACOR score assessed at 1-2 h of NIV, early intubation is associated with decreased hospital mortality.