Patients screened for the HYPERION trial but found to have at least one non-inclusion criterion more often had good day-90 neurological outcomes than did patients included in HYPERION, overall and in both treatment arms. The proportion of non-included patients with good day-90 functional outcomes varied widely across centres, suggesting differences in patient care. For patients managed with TTM, the most common target temperature was 36 °C, followed fairly closely by 33 °C.
When choosing inclusion and non-inclusion criteria for an RCT, a balance should be sought between ensuring external validity by including a large proportion of screened patients and avoiding the inclusion of patients who are unlikely to benefit or likely to experience harm from the trial intervention. The TTM2 trial comparing 33 °C to normothermia with early treatment of fever found no benefits of hypothermia on either all-cause mortality during the trial or functional outcomes [3]. Interestingly, whereas 21% (584/2723) of screened patients were randomised in the HYPERION trial, this proportion was 66% (950/1431) in the TTM1 trial [6] and 44% (1900/4355) in the TTM2 trial [3]. In a prospective study, 30% of ICU patients met the selection criteria for only one of 15 frequently cited RCTs, and 52% met criteria for none of these trials [9].
Previous data on screening and eligibility for RCTs are scarce. Among critical-care patients, absence of inclusion criteria was more common than presence of non-inclusion criteria [9]. A study reported in 2015 [10] found that about half the patients who were both screened and eligible for trials in acute respiratory distress syndrome were not included. Moreover, RCT enrolment was associated with better outcomes compared to those in eligible patients who were not enrolled. Interestingly, in patients with acute respiratory distress syndrome, mortality was higher among non-enrolled than enrolled patients, in contradiction to our findings [11]. Despite the generally accepted adverse prognostic significance of most of the non-inclusion criteria used in HYPERION (e.g., moribund status, long no-flow and low-flow durations, and haemodynamic instability), the functional outcomes were better in the non-included than in the included patients. Also, although the frequencies of good neurological outcomes were very low in the groups with long no-flow and low-flow durations (2.1% and 3.1%, respectively), they were not very different from the frequency in the normothermia group of the HYPERION trial (5.7%). A negative self-fulfilling prophecy effect may have occurred, with investigators tending not to include patients whom they expected would die shortly after inclusion. In contrast to this expectation, mortality was lower in non-enrolled patients. Moreover, 3.1% of moribund patients had good day-90 functional outcomes. Outcomes varied across centres, with the proportion of patients having a favourable outcome ranging from 0 to 66% (Fig. 1). The centre with the highest proportion had only 9 patients and the five centres with proportions smaller than 5% had relatively small sample sizes (34 to 66 patients). These variations may be ascribable not only to differences in management, but also to differences in geographic characteristics influencing time to first-responder care and time to admission. Moreover, the selection of centres for invitation to participate in RCTs often depends on network membership, prior collaborations, and personal contacts, which may introduce bias [12, 13].
Third, the use of selection criteria for RCTs limits the general applicability of the findings, which is of great importance. However, selection criteria improve the uniformity of the population, which is essential since a given intervention may benefit some patients but not others. For instance, short-duration intravenous antibiotic therapy is beneficial in patients with septic shock [14], but can be deleterious in those with other presentations [15]. A survey done in the UK assessed the experience of trial recruiters regarding the interpretation and application of eligibility criteria [16]. The main issues reported by the respondents were lack of clarity about what each inclusion and non-inclusion criterion meant, feasibility challenges in assessing the eligibility criteria by obtaining the appropriate investigations within the required timeframe, and uncertainty about whether the criteria were necessary.
The limitations of our study include the retrospective design. Selection and classification bias can occur during medical chart review. More specifically, we were unable to study the subgroup of patients excluded for reasons outside the control of the trial designer. Most of these patients had vulnerability markers such as young age, absence of health insurance, and being under guardianship. Consequently, although the pathophysiology of cardiac arrest is probably similar in this vulnerable subgroup to that in included patients, the treatments and outcomes may differ in ways that might have biased the present study. Although the day-90 functional outcome was usually determined during a telephone interview, this method may have resulted in overestimation of good functional outcomes compared to blinded assessment by a neuropsychologist trained for this specific evaluation. Finally, when comparing patients who did vs. did not receive specific interventions, we were unable to adjust for acute illness severity at ICU admission as assessed by an appropriate score such as the CAHP [17] or OHCA [18], as the data needed to determine these scores were not consistently available in the medical charts.
Further investigations are needed to help translate clinical research findings to the real-life setting. In the specific case of cardiac arrest, whether lowering the body temperature or preventing fever is the most effective intervention should be determined. Finally, given the heterogeneity of cardiac-arrest patients, studies are needed to identify the subgroups most likely to benefit from specific interventions.