We found a relatively small number of RCTs in adult CCM that showed that the new intervention was associated with increased mortality compared to control group. Most RCTs were academically sponsored and assessed a wide range of nearly always clinically available interventions. The intervention harm signal was remarkably wide: Absolute risk increase ranged from 2.6 to 29 % (mean 5 %), and the number needed to harm ranged from 3.5 to 38.5 (mean 20). The risks of RCTs finding significantly increased mortality were not explained by the source of sponsorship, geographical settings, lack of blinding, lack of DSMBs or early stopping rules, underlying conditions, inclusion criteria (which ranged widely) or the types of interventions (which also ranged widely). Few prior POP/Phase II RCTs of the particular intervention were done by the same group doing the pivotal/Phase III RCT. While eight of ten RCTs did interim analyses and seven were stopped early, none of the RCTs used response adaptive RCT design.
Ironically these large, pivotal RCTs moved forward because prior proof-of-principle (POP)/Phase II RCTs showed significant results for a surrogate marker chosen to predict success in pivotal Phase III RCTs. Thus, the POP RCTs were false-positive signals of success of pivotal/Phase III RCTs. The reasons for these false-positive signals could include inadequate surrogate markers, changes in dose, changes in usual care, changes in inclusion criteria and settings and/or the play of chance between the POP/Phase II and pivotal/Phase III RCTs. These examples of increased mortality rates of the intervention compared to control groups in pivotal/Phase III RCTs after prior positive POP/Phase II highlight the need for more accurate surrogate markers—biomarker(s) and/or clinical markers—that predict success in pivotal/Phase II RCTs.
We have extended the prior review of increased mortality trials by Freeman et al. [20] who reviewed only human growth hormone, the NO inhibitor L-NMA and diaspirin-hemoglobin. Furthermore, Freeman and colleagues [20] concluded that “factors in the design and conduct of the clinical trial that led to this result be thoroughly discussed.” Ospina-Tascon et al. [21] reviewed methods and quality of multicenter RCTs that had mortality as a primary endpoint and found seven trials as of 2008 that found increased mortality in the intervention group; they concluded that relatively few RCTs “show a beneficial impact of the intervention on the survival of critically ill patients.” We have added to Ospina-Tascon et al. [21] by focusing on RCTs showing harm, and we have updated RCTs reported since 2008. The small number of RCTs in CCM, rigorous Data and Safety Monitoring Board processes, and/or robust proof-of-concept RCTs could explain the low number of harmful RCTs in CCM.
The interventions ranged widely and were nearly always clinically available, suggesting that adult CCM adapted new technologies without positive pivotal RCTs. Clinically available interventions included human growth hormone, methylprednisolone [17], increased oxygen delivery [10], novel resuscitation fluid for resuscitation [12], high-frequency oscillation ventilation [16] and salbutamol. There were three interventions that were not available clinically: diaspirin cross-linked hemoglobin [14], NO synthase (iNOS) inhibitor L-NMA [11] and TNF-α receptor [15].
An alternative interpretation is that the new interventions were unproven, that only large pivotal Phase III RCTs powered for mortality would prove efficacy, and that inevitably some interventions increase mortality in Phase III. Thus, some argue such RCTs are necessary—indeed critical—to guide clinical practice away from harmful interventions.
Finding increased mortality of the “conventional” group is essentially showing that the newer therapy is more effective and that is the goal of the vast majority of superiority design RCTs in critical care. We chose to search for and review papers in which the results were the opposite of the initial hypothesis—i.e., the “conventional” intervention was superior, and the new therapy significantly increased mortality. Our contention is the same regardless of design, i.e., we aim to minimize harm from new interventions in future RCTs. If more patients die with the conventional therapy, that is the same as showing that the new therapy is more effective. We thus believe it does matter whether the increased mortality is in the “conventional” therapy group (i.e., showing more benefit with the new intervention) versus in the new therapy group (i.e., the hypothesis that the new intervention was better is not merely not shown, and the exact opposite to the hypothesis is shown). We did not avoid RCTs in which there are contrasting strategies in which none can be considered as “new” or “conventional.”
Equipoise is a crucial issue for our study of RCTs showing increased mortality with the new intervention. One might consider that if there was equipoise regarding interventions prior to RCTs (and there must be equipoise for ethical conduct and ethics approval of RCTs), then one might suppose that there would have been a similar number of RCTs showing increased mortality with the new intervention as the number of RCTs showing increased mortality with the “conventional” intervention. It is possible that RCTs that showed increased mortality with the “conventional” intervention are much more numerous than the inverse; however, the low number of RCTs in CCM—sepsis in particular—that found significantly lower mortality with the new intervention suggests that it is not the case that RCTs that showed increased mortality with the “conventional” intervention are much more numerous than the inverse.
It is also possible that harmful effects of interventions are more easily identified than beneficial effects, for example, in a subgroup. In a hypothetical placebo-controlled study of penicillin in sepsis, where beneficial effects in a small subgroup of patients would be difficult to prove, anaphylactic reactions would be well recognized. This heterogeneity of treatment effect in detection of benefit versus harm was reviewed elegantly in simulations of sepsis RCTs by Iwashyna et al. [22]. They show that positive RCTs (i.e., beneficial overall) could have buried in them subgroup(s) with consistent harm or little benefit such as low-risk patients who met enrollment criteria.
It is possible that some RCTs that found increased mortality with the “new intervention” were not published (especially before clinicaltrials.gov registration). If the following supposition were true—there were the same number of RCTs in which both new and old intervention increased mortality—then that supposition would indicate that the vast majority of RCTs were unable to show anything other than equivalence, or non-inferiority.
Response adaptive trial design might decrease the risk of RCTs that show increased mortality of intervention versus control group(s). Response adaptive trial design RCTs adjust group randomization (intervention or control) and sometimes dose while the RCT progresses by using ongoing interim results. This is scientifically sound when the algorithms for group/dose assignment are comprehensively pre-specified and investigators and sponsors cannot adjust ongoing randomization. Strengths of response adaptive trial design include more efficient assessment of efficacy, limited risk of RCTs that find potential harm (by decreasing the sample size compared to frequentist statistical trial design), decreased time and expense (i.e., more efficient futility rules), improved drug dose selection and earlier completion of negative RCTs. We do acknowledge that this approach could limit the enrollment of a sufficient number of patients to be highly confident about mortality differences.
Response adaptive trial design often adjusts sample size to prevent an underpowered RCTs or excessively large RCTs when the treatment effect is larger than expected. Interim analyses of PROWESS-SHOCK lead to increased sample size because of lower than expected blinded mortality [23, 24]. Response adaptive trial designs are now used in proof-of-concept RCTs in critical care (e.g., l-carnitine [25]) and pivotal RCT of selepressin versus placebo in septic shock (https://clinicaltrials.gov/ct2/show/NCT02508649?term=selepressin&rank=1). The regulatory bodies (FDA and EMEA) have accepted this design recently [4] (https://clinicaltrials.gov/ct2/show/NCT02508649?term=selepressin&rank=1).
Patient safety in RCTs necessitates methods to decrease the risk of excess deaths due to new interventions [26]. Response adaptive trial design could decrease the risk of increased mortality only after a large sample size has been evaluated (27) by adjusting randomization to the intervention or control as the RCT progresses based on ongoing interim analyses. Some believe that Phase III mortality RCTs that stop early for efficacy overestimate treatment efficacy [27].
Most RCTs had secondary results that might explain higher mortality rates of intervention compared to control groups such as increased cardiovascular complications (tachyarrhythmias possibly caused by the interventions in two RCTs (dobutamine [10]; intravenous salbutamol infusion [9]), decreased cardiac output by excessive vasoconstriction due to NOS inhibition [11] or by decreased venous return secondary to high-frequency oscillation ventilation [16]), renal toxicity (hetastarch increased risk of acute kidney injury [12, 18]) and hypoglycemia (intensive insulin had significantly increased severe hypoglycemia [8]). The causes of excess mortality associated with human growth hormone and methylprednisolone [17] were not explained although speculations were presented [13]. Removal of TNF-α by TNF-α receptor was hypothesized to be potentially deleterious [15]. Larger Phase II proof-of-concept RCTs could have clarified safety risks thus modifying (e.g., drug dose, duration, specific safety signal monitoring) or avoiding pivotal Phase III RCTs.
All but one of the RCTs was preceded by prior proof-of-principle Phase II RCTs, and another prior pivotal Phase III [8] RCT by the same group preceded the pivotal RCT. We speculate that larger, dose response adaptive trial design of Phase II RCTs in CCM could limit safety risk and increase the probability of technical success of Phase III RCTs.
Cardiology has improved outcomes through well-designed, large RCTs and emphasizes a model of cooperation between academia and industry [28]. Cardiology RCTs adjusted design from lessons learned from earlier missteps. For example, the CAST trial [29, 30] enrolled patients at risk of ventricular arrhythmias and randomized to the anti-arrhythmics moricizine, encainide or flecainide versus placebo. At the first interim analysis, the DSMB recommended stopping the encainide and flecainide arms (pooled mortality was higher than placebo). Ironically, a later editorial bemoaned that two “potentially efficacious” drugs could be removed from clinical usage, not mentioning the increased mortality [30]. CAST catalyzed rigorous, independent monitoring of RCTs [31] [e.g., Academic Research Organizations in independent DSMBs and regulatory guidance on DSMB function (www.fda.gov/OHRMS/DOCKETS/98fr/01d-0489-gdl0003.pdf)].
Our review of RCTs that found increased mortality in the intervention arm in CCM emphasizes caution in the design and monitoring of future RCTs in critically ill patients. Our review of Phase II proof-of-principle RCTs aligns with recent emphasis on Phase II RCTs [32] and complements other suggestions to improve chances of success in critical care RCTs [4].
The strengths of our analyses are wide inclusion criteria, careful screening of the adult CCM RCT literature, detailed evaluation of many aspects of RCT design and implementation and consideration of RCT- and intervention-specific risks—as opposed to design risks. Shortcomings are that we did not have access to original RCT data to model how the use of response adaptive trial design could have decreased risk of excessive intervention group mortality rates, we did not search for secondary publications that could have explained the causes of increased mortality in the intervention groups, and our findings may not apply to other fields (critically ill patients have increased risk of adverse events, have a high mortality and receive numerous interventions).