The end-expiratory occlusion test for detecting preload responsiveness: a systematic review and meta-analysis

Background We performed a systematic review and meta-analysis of studies assessing the end-expiratory occlusion test (EEXPO test)-induced changes in cardiac output (CO) measured by any haemodynamic monitoring device, as indicators of preload responsiveness. Methods MEDLINE, EMBASE and Cochrane Database were screened for original articles. Bivariate random-effects meta-analysis determined the Area under the Summary Receiver Operating Characteristic (AUSROC) curve of EEXPO test-induced changes in CO to detect preload responsiveness, as well as pooled sensitivity and specificity and the best diagnostic threshold. Results Thirteen studies (530 patients) were included. Nine studies were performed in the intensive care unit and four in the operating room. The pooled sensitivity and the pooled specificity for the EEXPO test-induced changes in CO were 0.85 [0.77–0.91] and 0.88 [0.83–0.91], respectively. The AUSROC curve was 0.91 [0.86–0.94] with the best threshold of CO increase at 5.1 ± 0.2%. The accuracy of the test was not different when changes in CO were monitored through pulse contour analysis compared to other methods (AUSROC: 0.93 [0.91–0.95] vs. 0.87 [0.82–0.96], respectively, p = 0.62). Also, it was not different in studies in which the tidal volume was ≤ 7 mL/kg compared to the remaining ones (AUSROC: 0.96 [0.92–0.97] vs. 0.89 [0.82–0.95] respectively, p = 0.44). Subgroup analyses identified one possible source of heterogeneity. Conclusions EEXPO test-induced changes in CO reliably detect preload responsiveness. The diagnostic performance is not influenced by the method used to track the EEXPO test-induced changes in CO. Trial registration The study protocol was prospectively registered on PROSPERO: CRD42019138265.


Background
Over the last 20 years, many dynamic tests were developed and validated to predict whether a fluid bolus will increase cardiac output (CO) significantly [1]. They all consist in observing the effects on CO of variations in cardiac preload occurring under different circumstances. The variations of arterial pulse pressure and stroke volume induced by mechanical ventilation are very reliable indices of preload responsiveness [2,3], but they are strongly limited by the restricted conditions in which they can be used. Administering small amounts of fluid may predict the response to larger ones [4], but such "mini fluid challenges" require a very precise measurement of CO and, if repeated, may contribute to fluid overload. Passive leg raising reversibly mimics fluid infusion and detects preload responsiveness very reliably [5], but intra-abdominal hypertension is responsible for some false-negatives [6] and it is not very convenient to perform [7].
In this context, the transient interruption of mechanical ventilation at end-expiration was recommended 10 years ago for testing preload responsiveness through heart-lung interactions [8]. By interrupting the impediment to venous return induced by each mechanical insufflation, the expiratory hold allows the cardiac preload to augment, which, in case of preload responsiveness, leads to a significant increase of CO [9].
Some studies testing the diagnostic accuracy of the end-expiratory occlusion (EEXPO) test have been published after that first one, with different methods of CO measurement, durations of expiratory hold and clinical settings. A meta-analysis [10] has been performed with eight of these studies [8,[11][12][13][14][15][16][17]. However, it failed to include two studies [18,19] which had already been published about the reliability of the EEXPO test. Moreover, no subgroup analysis was performed to look for factors of heterogeneity, whilst some of them might be significant. This might be the case, for instance, for the duration of the EEXPO or the technique used to monitor CO [10]. Finally, some additional studies [20][21][22] were published afterwards, and additional patients may allow one to perform the subgroup analysis that had not been performed by Messina et al. [10]. Then, we conducted a new systematic review of all the studies testing the diagnostic accuracy of the EEXPO test. In particular, taking advantage of the large number of patients pooled, we aimed at looking for factors influencing the reliability of the EEXPO test.

Clinical research question
The clinical research question was: What is the sensitivity and specificity of the EEXPO test to detect preload responsiveness when its effects are assessed on cardiac output?

PICO statement
The PICO statement was the following: • P-patient, problem or population: surgical or critically ill patients under mechanical ventilation in whom the effect of volume expansion on CO needs to be predicted. • I-intervention: EEXPO test performed by holding the patient's breath at the end of expiration during invasive mechanical ventilation and by measuring the induced changes in CO, measured by any available monitoring device. • C-comparison, control or comparator: preload responsiveness defined as either a 10 to 15% increase in CO during volume expansion (250-500 mL of fluid in ≤ 30 min) or 10% during passive leg raising (PLR), measured by any available monitoring devices. • O-outcomes: ability of the EEXPO test to detect preload responsiveness, defined in each study according to the pre-specified threshold of CO increase after either volume expansion or PLR.

Identification of records
Our aim was to identify all studies evaluating the ability of the EEXPO test to predict a significant increase in CO or surrogate (velocity time integral of the left ventricular outflow tract with echocardiography, blood velocity of the descending aorta with oesophageal Doppler) compared to the one induced by a subsequent volume expansion or by a PLR test. We included into our analysis only the studies that provided sensitivity, specificity and the area under the receiver operating characteristic curve (AUROC) of the EEXPO test with the corresponding diagnostic threshold. Moreover, only studies on adults, that were published in full text or accepted for publication in indexed journals, were included in our analysis. No language restriction was applied. We searched the US National Library of Medicine's MEDLINE database, the EMBASE database and the Cochrane Database of Systematic Reviews for relevant studies published from 1960 to 1st October 2019. We used the following medical subject headings and keywords: "end expiratory occlusion", "end expiratory", "volume expansion", "fluid challenge", "fluid administration", "fluid responsiveness", "preload responsiveness". The complete searching strategy is reported in Additional file 1: Figure S1. We also looked for relevant articles cited in reviews, articles and editorials. The search was performed by two independent investigators (FG and RS) until no new record could be found. Conflicts regarding inclusion or exclusion of studies were resolved by consensus with a third investigator (XM). The meta-analysis was performed according to the PRISMA statement (http:// www.prism a-state ment.org). The study protocol was prospectively registered in PROSPERO (CRD42019138265-Submission 7th June 2019, approval 29th August 2019).

Data extraction
Using a standardised form, two investigators (FG and SR) independently extracted several data from the selected studies, including demographic characteristics of the investigated population, ventilatory variables, the duration of the EEXPO test, the method used to assess its haemodynamic effects on CO or its surrogate, the amount and type of fluid infused and the duration of the infusion of volume expansion, when performed, as well as the criteria used to define preload responsiveness. Moreover, the number of true-positives, true-negatives, false-positives and false-negatives as well as sensitivity and specificity, the AUROC and the best EEXPO-induced increase in CO or surrogates able to detect preload responsiveness were collected.

Assessment of risk of bias in included studies
Two authors (FG and RS) independently assessed the overall quality of evidence at the outcome level according to the GRADE system [23]. Moreover, they assessed the risk of bias of the included studies by following the criteria specified in the QUADAS-2 scale [24]. For each criterion, the risk of bias was judged as high, low or unclear. Disagreements between the reviewers were resolved by consensus with a third investigator (XM). Then, as described elsewhere [5], points were given to each issue of the QUADAS-2 evaluation (three points for "high", two points for "unclear" and one point for "low") and their sum was calculated. "Overall higher" and "overall lower" risk of bias was defined with reference to the median of the risk bias of all studies [5].

Statistical analysis Study description
Study-specific sensitivity and specificity values have been computed considering a 0.5 continuity correction as indicated in the literature (Additional file 1: Figure S2). The 95% confidence intervals have also been calculated using the Wilson [25] method. A graphical representation of the data has been provided. Paired forest plots on sensitivity and specificity and confidence ellipses (95%) plots have also been reported. The correlation of sensitivities and false-positive rates has been reported to investigate a possible threshold effect.
For the principal analysis, if more than one technique was used to assess the haemodynamic effects of the EEXPO test, we chose the one considered to be the most reliable: when both oesophageal Doppler and end-tidal carbon dioxide were used, we only considered oesophageal Doppler and when both echocardiography or oesophageal Doppler and calibrated pulse contour analyses were used, we considered only pulse contour analysis. Finally, for that analysis, in studies in which the EEXPO test was performed at different positive end-expiratory pressure (PEEP) or tidal volume levels, we selected the ones that provided the highest AUROC.

Bivariate random-effect model
The bivariate random-effects model by Reitsma [26] was computed to estimate the area under the summary receiver operating characteristic (AUSROC) curve accounting for correlation between sensitivity and specificity. The model was estimated through a restricted maximum likelihood approach. In the bivariate model, the logit sensitivity and the logit specificity are assumed to be bivariate normal random variables across the studies considering also a variance and covariance matrix for the random-effect component. A bivariate version of I 2 statistics was computed to investigate the presence of heterogeneity on sensitivity and specificity outcome, as indicated in the literature [27]. A value of I 2 ≥ 75% was considered as indicating a high heterogeneity [28].

Investigation of heterogeneity sources
The potential sources of heterogeneity were investigated considering a Reitsma bivariate random-effect metaregression model. Separate meta-regression models were calculated, considering as covariates: • operating room (OR). • Risk of bias: "overall lower" vs. "overall higher", as described above.
The covariate effects on the sensitivity and false-positive rate were reported together with p values and 95% confidence intervals. The likelihood ratio test was carried out comparing a null model with a model with a covariate. A significant likelihood ratio test indicates that the covariate is a potential source of heterogeneity across studies. Publication bias was investigated using the Deeks's test [29]. The statistical significance was set at a p value < 0.05. The analyses were performed using R 3.3.5 [30] with mada package [31].

Characteristics of the included studies
We identified 13 studies (530 patients) [8,[11][12][13][14][15][16][17][18][19][20][21][22] that reported the ability of the EEXPO test to assess preload responsiveness. The flowchart in Fig. 1 illustrates the study selection and the main characteristics of the included studies reported in Table 1. Nine studies [8,11,[14][15][16][17][18][19][20] were performed in the ICU and four in the OR [12,13,21,22]. In one study in the ICU [15], the EEXPO test was performed during prone positioning. All patients were mechanically ventilated with a tidal volume ranging between 5.8 mL/kg [20] and 8.2 mL/kg [12], with a median value of 6.95 mL/kg. In two studies [14,21], the diagnostic ability of the EEXPO test was assessed under a tidal volume at 6 mL/kg and repeated at a tidal volume at 8 mL/kg. The PEEP level was set between 4 cmH 2 O [12] and 14 cmH 2 O [19], with a median value of 7 cmH 2 O. The results of the QUADAS-2 evaluation are reported in Additional file 1: Figure S3. Following the GRADE system, the overall quality of evidence for the included studies was assessed as very low (Additional file 1: Figure  S4).

Risk of bias
When we divided the studies according to the global risk of bias, no significant difference was observed in AUSROCs between studies with overall lower [8,11,16,[18][19][20] Figure S5.6).

Sources of heterogeneity and publication bias
In the Reitsma bivariate random-effect meta-regression models, only the overall risk of bias emerged as a potential source of heterogeneity (p = 0.049) (Additional file 1: Figure S5.6). On the contrary, none of the other covariates was identified as a source of heterogeneity. According to the results of the Deeks's test, we did not detect publication bias in the studies that evaluated the diagnostic performance of the EEXPO test (p = 0.864) (Additional file 1: Figure S6).

Discussion
This meta-analysis of 13 studies performed in 530 patients shows that the changes in CO induced by the EEXPO test reliably detect preload responsiveness with excellent sensitivity and specificity (0. The EEXPO test is based on heart-lung interactions. During positive pressure ventilation, insufflation increases the intrathoracic pressure and right atrial pressure, impeding venous return [2]. It interrupts the increase in cardiac preload that occurred during exsufflation. Then, EEXPO stops this cyclic impediment of venous return and allows cardiac preload to increase. Right cardiac preload increases first, and provided that the EEXPO is long enough for allowing the transit of this increase through the pulmonary vasculature, it is followed by the increase of left cardiac preload. The interruption of ventilation also stops the cyclic compression of the pulmonary vasculature, which facilitates the transference of preload increase from the right to the left side. The transient increase in cardiac preload induced by the EEXPO test can be seen as a small "self-preload challenge" which might be used to assess preload responsiveness [9].
A number of studies have now tested the reliability of the EEXPO test. Many were positive but some of them showed some contradicting results, which led us to perform a meta-analysis. Despite these studies, we report that the AUSROC of the EEXPO test to detect preload responsiveness is high, comparable to the one reported in meta-analyses for pulse pressure variation [32] and the passive leg raising test [5], and higher than the one found for the respiratory variations in the inferior or superior vena cava [33]. The present meta-analysis confirms another one recently published by Messina et al. [10], which included five less studies [18][19][20][21][22]. Importantly, the novelty of our meta-analysis is that it allowed us to investigate some of the factors which may, in theory, alter the test reliability and which have not been investigated in the former meta-analysis. First, no significant difference was observed between studies in which the duration of the respiratory hold was ≤ 15 s and studies in which it was longer, which indicates that a duration of 15 s appears enough. In practice, this is an important point since all ventilators do not allow respiratory holds ≥ 15 s.
Second, the level of PEEP might be theoretically important, since it is the level to which the airway pressure is reduced during EEXPO. However, in a previous study in which two levels of PEEP were compared in the same patients, the diagnostic accuracy of the EEXPO test was unchanged [19]. The present meta-analysis tends to confirm this, since the AUSROC was similar amongst studies with high or low PEEP levels. Nevertheless, both levels were defined according to the median value of PEEP levels, which was only 7 cmH 2 O. One should keep in mind that in theory, the haemodynamic effects of the EEXPO test should depend more on the respiratory driving pressure than on the PEEP alone, a hypothesis that remains to be tested. Of note, the worst reliability of the EEXPO test was reported by a study performed in prone positioning [15], in which the PEEP level was 8 cmH 2 O on average. Since there is no clear reason why prone positioning should change the reliability of the EEXPO test, and since this was reported in that single study, no clear conclusion about this point could be drawn without further investigations.
A third factor that might theoretically affect the EEXPO test reliability is the tidal volume. Two studies which have compared these two levels of tidal volume reported that diagnostic accuracy was correct at 8 mL/kg but poorer at 6 mL/kg [14,21]. However, even if they did not directly compare different tidal volume levels, some of the other studies which reported excellent diagnostic accuracy had included some patients with low tidal volume values, as indicated by the mean and standard deviation reported in their whole population. If the test reliability had been poor in these patients, the averaged reliability could not have been so good. In line with these studies, the present meta-analysis did not show any difference in AUSROC when studies were compared with respect to the median of reported tidal volumes. These conflicting results suggest that the question to know whether the tidal volume actually influences the EEXPO test reliability is still unanswered.
A fourth and important issue is the method which is used for measuring the EEXPO-induced changes in CO. One advantage of the present meta-analysis was that it included studies using the devices that are the most used in the ICU nowadays [34]. As a matter of fact, the small threshold defining the test positivity may require precise CO monitoring devices. The least significant change of echocardiography [35] and oesophageal Doppler [20] is close to the diagnostic threshold of the EEXPO test. This is the reason why two studies performed with oesophageal Doppler [20] and echocardiography [16] have resolved this issue by combining the changes in CO induced by both end-expiratory and end-inspiratory holds. The present meta-analysis could not test the advantage of this strategy which was evaluated in these two studies only. However, even though the precision of pulse contour analysis [36] is higher than for the other tested methods, no significant difference has emerged when it was used to track CO changes compared to other methods. One study assessed the EEXPO effects through the changes in end-tidal carbon dioxide [12]. Of note, this way of tracking the EEXPO-induced changes in CO has been questioned [37]. However, the fact that the diagnostic accuracy of the EEXPO test is not influenced by the CO monitoring methods used is a strong argument in favour of the reliability of the test at bedside. Finally, the reliability of the EEXPO test was excellent in both the ICU and OR settings, but there is no obvious reason why it should not be the case.
The heterogeneity of the included studies is one of the limitations of our meta-analysis. However, the metaregression analysis has investigated several possible sources of heterogeneity, identifying one of them (Additional file 1: Figure S5). Another limitation is that the studies included were all single-centre and enrolled a relatively small number of patients. Nevertheless, this is the interest of a meta-analysis to merge these small-size studies in order to draw more solid conclusions. Some of the studies suffered from biases as assessed with the QUADAS-2 (Additional file 1: Figure S3). Nevertheless, to improve investigation of their role as possible causes of heterogeneity, we performed a pre-specified subgroup analysis by dividing the studies according to the global risk of bias: no difference was observed in the accuracy of the EEXPO test between studies with overall lower and higher risk of bias. We also evaluated the overall quality of evidence of the studies included in the meta-analysis according to the GRADE system, with a whole judgement of "very low" (Additional file 1: Figure S3). Nonetheless, we believe that these findings should be extensible to each sample of EEXPO test studies, considering their recurrent weakness, related to small sample sizes, no power analysis and clinical heterogeneity. Finally, a large number of the included studies were performed by the same team, which had described the EEXPO test for the first time [8].