Outcomes of ICU patients with and without perceptions of excessive care: a comparison between cancer and non-cancer patients

Background Whether Intensive Care Unit (ICU) clinicians display unconscious bias towards cancer patients is unknown. The aim of this study was to compare the outcomes of critically ill patients with and without perceptions of excessive care (PECs) by ICU clinicians in patients with and without cancer. Methods This study is a sub-analysis of the large multicentre DISPROPRICUS study. Clinicians of 56 ICUs in Europe and the United States completed a daily questionnaire about the appropriateness of care during a 28-day period. We compared the cumulative incidence of patients with concordant PECs, treatment limitation decisions (TLDs) and death between patients with uncontrolled and controlled cancer, and patients without cancer. Results Of the 1641 patients, 117 (7.1%) had uncontrolled cancer and 270 (16.4%) had controlled cancer. The cumulative incidence of concordant PECs in patients with uncontrolled and controlled cancer versus patients without cancer was 20.5%, 8.1%, and 9.1% (p < 0.001 and p = 0.62, respectively). In patients with concordant PECs, we found no evidence for a difference in time from admission until death (HR 1.02, 95% CI 0.60–1.72 and HR 0.87, 95% CI 0.49–1.54) and TLDs (HR 0.81, 95% CI 0.33–1.99 and HR 0.70, 95% CI 0.27–1.81) across subgroups. In patients without concordant PECs, we found differences between the time from admission until death (HR 2.23, 95% CI 1.58–3.15 and 1.66, 95% CI 1.28–2.15), without a corresponding increase in time until TLDs (NA, p = 0.3 and 0.7) across subgroups. Conclusions The absence of a difference in time from admission until TLDs and death in patients with concordant PECs makes bias by ICU clinicians towards cancer patients unlikely. However, the differences between the time from admission until death, without a corresponding increase in time until TLDs, suggest prognostic unawareness, uncertainty or optimism in ICU clinicians who did not provide PECs, more specifically in patients with uncontrolled cancer. This study highlights the need to improve intra- and interdisciplinary ethical reflection and subsequent decision-making at the ICU. Supplementary Information The online version contains supplementary material available at 10.1186/s13613-021-00895-5.

showed that 1 in 3 patients die shortly after treatment at ICU, which may be an indication that the level of care is not appropriate. A possible measure of overtreatment is the number of perceptions of excessive care (PECs) that patients receive from clinicians.
For this thesis, data from the 28-day observational study (multicenter DISPROPRICUS study) has been used. The dataset contains patient, ICU, country, clinician and hospital characteristics. The main objective of this thesis was to study whether it is possible that clinicians discriminate certain subgroups of patients. Discrimination is possible if 1) there is a significant difference in the proportion of patients with concordant PECs or in the rate of receiving those concordant PECs and 2) if there is a significant difference in the proportion of patients with a treatment limitation decision (TLD) registration or in the rate of TLD registration between the different subgroups.
In order to evaluate whether discrimination of patients may occur, cumulative incidence curves have been constructed for all subgroups for 1) the time from admission until receiving the 2 nd PEC and 2) the time from receiving the 2 nd PEC until TLD registration. By fitting causespecific hazard models, hazard rates for different subgroups could be compared. Cumulative incidence curves for the time from receiving the 2 nd PEC until death or combined endpoint have been studied as well (although no extra information about possible discrimination can be obtained from these). To adjust for background characteristics, inverse propensity score weighting has been applied and weighted cumulative incidence curves were constructed. The propensity score is defined as the estimated conditional probability for a patient to belong to his own subgroup given the patient's characteristics.

VI
The results based on the unweighted and weighted cumulative incidence curves are almost identical. No significant difference in cause-specific hazard rate of TLD registration was detected between any of the subgroups. The cause-specific hazard rate of receiving concordant PECs was significantly higher for older patients and for patients with hematological cancer in comparison with the patients in the other subgroups. No significant difference in cause-specific hazard rate of receiving concordant PECs between the surgery subgroups was detected. Based on the unweighted cumulative incidence curves, the cause-specific hazard rate of receiving concordant PECs was significantly higher for patients with active cancer than for patients with not active cancer or patients without cancer. However, this difference was not detected based on the weighted cumulative incidence curves. Overall, it could be concluded that discrimination of patients based on age, cancer type, cancer status or surgery type by clinicians doesn't seem plausible. 1

Introduction
An intensive care unit (ICU) is a department in a hospital that is specialised in treating patients with life-threatening conditions. These are, for example, patients that went through major surgery, had a severe accident, have cardiac problems, severe pneumonia or sepsis, are in a coma, are paralyzed, etc. In order to be able to treat these patients, well-educated and trained doctors and nurses are needed.
Life-supporting therapy should only be provided to a patient when the following two conditions are met. Firstly, the patient and his relatives should be well informed about the treatment and the associated risks and the level of care must be in accordance with the patient's wishes and preferences. Secondly, the treatment intensity should remain proportional to the expected result.
However, Teno J.M. et al. (2013) [1] showed that 1 in 3 patients die shortly after ICU treatment, which raises the question if the level of care given to patients is too high (overtreatment) or too low (undertreatment).
In this study, the focus lies on possible overtreatment of patients. Several risks are associated with overtreatment. Firstly, if too much care is given, it is possible that unnecessary suffering is added to the patient and the family members. Secondly, healthcare providers also want the level of care they are providing to be appropriate. If they see that their patients die during the ICU-stay or shortly after, they may feel like they have failed and their motivation may decrease which may eventually lead to burn-outs. Thirdly, as ICUs are costly because of high technology and highly specialized personnel, it is important that the units work efficiently. Deciding how much money should be invested in the last years of life, knowing that the costs increase as the duration of care increases, is a difficult ethical issue.
An indication of possible overtreatment of patients, is the number of perceptions of excessive care (PECs) they receive from the clinicians. The multicenter study by D.D. Benoit et al. (2018) showed that the probability that a patient is dead, not at home or has a poor quality of life, one year after ICU admission, is much higher when the patient received a PEC by two or more clinicians independent from each other (concordant PECs) in comparison with a patient who didn't receive two or more PECs [2]. In order to be able to answer all these questions, the following items were studied for all subgroups: 1) the number of received PECs, 2) the predictive value of PECs with regard to the patient's condition after one year, 3) cumulative incidence curves (for the time from admission until the 2 nd PEC, for the time between the 2 nd PEC and death or combined endpoint and for the time between the 2 nd PEC and TLD) and 4) the (cause-specific) hazard rates of receiving concordant PECs, of TLD registration and of dying. To adjust for background characteristics, inverse propensity score weighting was applied and weighted cumulative incidence curves could be constructed. The propensity score is defined as the estimated conditional probability for a patient to belong to his own subgroup given the patient's characteristics.
This thesis has been split up in 8 parts. A description of the dataset is given in chapter 2. The theoretical background of the applied methods has been discussed in chapter 3. Chapter 4 explains how discrimination is defined and how it can be evaluated based on the available data.
A univariate description of the data has been presented in chapter 5. The risk of death within 28 days, the risk of death within 1 year, the risk of reaching the combined endpoint and the risk of TLD for the different subgroups has been discussed in chapter 6. Finally, the cumulative incidence curves, the weighting process and the weighted cumulative incidence curves have been discussed in chapters 7, 8 and 9.   [2], which was based on the same data as the current study, showed that the ICU mortality and length of stay in the average (-) climate differs from the other climates. It was concluded that the attending physicians in average (-) climates included patients in the study in a dissimilar way to physicians in good, average (+) and poor climates and that selection bias was therefore present (see App Table 1). In order to avoid problems due to this selection bias, it was decided for the current study to exclude all patients that were admitted at an ICU with an average (-) ethical climate (120) and to do the study based on the available information of the remaining 1641 patients.

Some important variables
In order to get a better feel of the data, some important variables will be discussed here. When a patient is in a really bad condition, it is possible that doctors decide to stop or to limit treatment. The dataset contains a binary variable with value 1 if a TLD has been registered and 0 otherwise.
In this study, a distinction was made between patients who did or did not receive a PEC by two or more clinicians independently from each other (concordant PECs). Therefore a binary variable has been created with value 1 if the patient has received concordant PECs and 0 otherwise.
A final series of variables indicate a time period: time from admission until receiving the 2 nd PEC, time between receiving the 2 nd PEC and death or CEP and time between receiving the 2 nd PEC and TLD. Those variables will be used to construct cumulative incidence curves.
The dataset contains many more variables, but they will not be discussed in further detail here. 5

Time, event, censoring and censoring assumptions
Survival analysis is the analysis of data for which the outcome variable of interest is the time until a certain event occurs (also called "the survival time"). Very often, survival analysis has to take censoring into account. Censoring occurs when there is some information about an individual's survival time, but the exact survival time is unknown. In most cases, right-censoring will be present: the observed survival time is shorter than the true survival time (see Figure 1) . There are generally three reasons why right-censoring occur: -The individual does not experience the event before the end of the study period.
-An individual is lost to follow up during the study period.
-An individual withdraws from the study before the end of the study period. Often occurring assumptions about censoring are random censoring and independent censoring: -Random censoring means that the individuals who are censored at time t are representative for the individuals who remained at risk at time t with respect to their survival time. Subjects at risk are subjects who have not experienced an event (yet) and who are not censored (yet). Assuming random censoring therefore means assuming that the event rate at time t is equal to the event rate at time t given that censoring has not occurred yet [4].

6
-Independent censoring means that, within any subgroup of interest, the individuals who are censored at time t are representative for the individuals in that subgroup who remained at risk at time t with respect to their survival time. Independent censoring is actually equal to random censoring within any subgroup of interest [3].

Survivor function and hazard function
The random variable for an individual's time-to-event is denoted by T. An observed value or a value of interest for the time-to-event is denoted by t. Two possible ways to describe time-to-event data are the survivor function S(t), with focus on the event not happening and the hazard function h(t), with focus on the event happening.
The survivor function gives the probability that an individual doesn't experience the event before a specific time t and is described as:

S t P T t =
In this study however, the focus lies on the hazard function which is described as: The hazard function, also called the conditional event rate, gives the instantaneous probability for the event occurring at time t, per time unit, given that the event didn't occur before time t [3].

Cox proportional hazard model a) The model
The hazard function can also be described by making use of the Cox proportional hazard model: This baseline hazard is a function of the time t. The Cox proportional hazard model is a semi-parametric model as it doesn't make any assumptions about the form of the baseline hazard function but it does assume a parametric form for the effect of the predictors on the hazard (they enter the model linearly at the logscale and the coefficients are independent from the time). The Cox proportional hazard model can be fitted to the data and coefficient estimates can be obtained by maximizing the partial likelihood function. The partial likelihood function is described as follows: This partial likelihood function only takes the likelihoods for the observed event times into consideration. These depend however on the number of subjects at risk, which will decrease as time goes by (subjects who experience the event or subjects who are censored are no longer at risk).

8
The advantage of the semi-parametric model is that no (possibly wrong) assumption about the form of the baseline hazard needs to be made. The disadvantage however is that the resulting estimates may not be as efficient as the maximum likelihood estimates when a complete parametric model would be used [3].

b) Hazard ratio and proportional hazard assumption
In order to compare the hazard rate or conditional event rate between two subjects A and B with ( ) 12 , , ... , respectively as sets of predictors, the hazard ratio between these two subjects can be estimated as follows: As can be seen in the formula above, the baseline hazard is cancelled out and the estimated hazard ratio / AB HR is not a function of the time t. The hazard rate for subject A is a constant multiple of the hazard rate for subject B. A Cox proportional hazard model can be fitted to the data using the function "coxph()" from the package "Survival" in R. With this function, the coefficient estimates, their standard errors, the results of the Wald tests and the result of the log rank test can be obtained [3] [5] [6].

c) Assessing proportional hazard assumptionscaled Schoenfeld residuals
If the proportional hazard assumption for the Cox model is not met, the coefficient estimates may be biased. Therefore, it is important to check whether the assumption is reasonable before using the model. This can be done by analysing the scaled Schoenfeld residuals for each covariate. The scaled Schoenfeld residuals at time t for covariate j can be calculated as follows: Assessment of the proportional hazard assumption can therefore be done by testing if there is a significant relationship between the scaled Schoenfeld residuals and time or not. When plotting the scaled Schoenfeld residuals in function of the time, no trend should be observed.
A non-zero slope of the smoothing spline fit indicates violation of the assumption [6] [7] [8].

d) Aalen's additive hazard model
If the proportional hazard assumption doesn't hold and the Cox proportional hazard model can't be used, Aalen's additive hazard model may be a good alternative. The hazard function here is described as: One of the advantages of this model is that it allows for the regression coefficients βj(t) to change over time. As no assumption has been made about the functional forms of these regression functions βj(t), Aalen's additive model is a non-parametric model.
In order to interpret β1(t), the hazard rate has been described for two individuals who have the same values for all covariates except for covariate x1 (a one unit difference): The coefficient β1(t) can be interpreted as the change in hazard rate if there is a unit increase in covariate x1 when all other covariates are kept constant.
Imagine now that only one covariate xA is present in the model. xA has a value 1 or 0 if the individual belongs to group A or B respectively. The hazard functions for both individuals are described as: The hazard ratio between these two is then: Testing if βA(t) = 0 is equivalent to testing if the hazard ratio is equal to one (the same test as was done with the Cox proportional hazard model) [9].
Aalen's additive hazard model can be fitted to the data using the function "aalen()" from the package "timereg" in R.

Dealing with competing risks a) Cause-specific hazard function
In previous sections, it was always assumed that there was only one event that could occur.
However, this is not always the case. When several different events are possible, but only one of these events can occur for a subject, then these events are called "competing events".
A general approach for analysing competing risk data, is to estimate hazard rates and hazard ratios for the event of interest and to treat the competing events as censored. The cause-specific hazard function is here defined as: Tc is a random variable and denotes the time to event c. The cause-specific hazard function hc(t) gives the instantaneous rate of event c at time t, given no event occurred before time t. The Cox cause-specific proportional hazard model is described as:

b) Independence assumption
When, within any subgroup of interest, individuals who are censored at time t are representative of all individuals in that subgroup who remained at risk, then censoring is independent.
A complication when studying competing risk data is that there are different types of censoring.
First, there is the usual censoring in case the event didn't occur before the end of the study, if a subject withdraws from the study or if a subject is lost to follow up. In addition, when there are competing events, only one event at the time can be studied and the competing events are also considered as censoring. However the subjects who experienced a competing event are censored, no independent censoring assumption is needed as all the necessary information is available: it is known that the subject experienced a competing event and the time at which the event occurred is also known [3].

c) Cumulative incidence
A good approach for analysing competing risk data is to calculate cumulative incidences and to construct cumulative incidence curves. The incidence of an event at time t is an estimate for the marginal probability of that event occurring at time t when competing events are present. The incidence can be estimated as follows: 1.

Log rank test
In order to compare the hazard functions between two or more groups of subjects, the nonparametric log rank test can be executed. The null hypothesis and alternative hypothesis are: ....

Application in study
For this study, three events were of interest: receiving concordant PECs, TLD registration after receiving concordant PECs and reaching the combined endpoint after receiving concordant PECs. When analysing the time to event, censoring and possible competing risks were taken into account.
For the time from admission until receiving concordant PECs: -Right censoring occurs when a patient didn't receive concordant PECs before the end of the study (after 28 days).
-There are two competing risks: 1) The patient can be discharged from the ICU before he received concordant PECs.
2) The patient may have died before he received concordant PECs.
-For 42 patients who did not receive concordant PECs, the discharge day or time of death was unknown. These patients were considered as censored at the first day of the study.

16
For the time from receiving concordant PECs until TLD registration: -Right censoring occurs when a patient didn't have a TLD registration before the end of the study.
-There are two competing risks: 1) The patient can be discharged from the ICU before a TLD was registered (the discharge day for all patients with concordant PECs was known).
2) The patient may have died before a TLD was registered. -There are no competing events.
In order to compare the hazard rates of the events between different subgroups Cox (causespecific) proportional hazard models were fitted to the data. Only one predictor was included in the models: the group indicator variable (= the variable responsible for the division of the patients in the age, cancer type, cancer status and surgery subgroups).

Experimental data versus observational data
A study can be based on experimental data from a randomised trial. The individuals participating in the study are randomly divided over two or more groups and a certain treatment is assigned to each group. Because of the randomization, there are no confounding issues.
Randomized trials can however not always be carried out due to practical or ethical reasons.
A lot of studies are therefore based on observational data, as was the case for this study.
Information about patients, hospitals and clinicians was obtained during a certain time span.
Based on patient characteristics, different subgroups were formed. The goal was to study the link between group membership and the outcome variable CEP. However, because no randomization process is involved, there may be confounding issues. For example, if there are more male patients in the first group than there are in the second group, then the association between group membership and CEP may be influenced by gender. In order to correct for systematic differences in background characteristics between the groups (confounders), inverse propensity score weighting was applied.

Propensity score
The basic idea of propensity score methods is to replace, for each patient in the study, all confounding background characteristics by one summarizing characteristic, called the propensity score. The propensity score for a patient is the probability that that patient belongs to a certain group given his set of covariates: The first step towards estimating the propensity scores is to find out which of the observed variables may be confounders. In this study, confounding variables are variables that are associated with the outcome variable CEP and with group membership. As CEP is a binary variable, logistic regression can be used to find variables that are associated with CEP. Next, multinomial logistic regression can be used to find out which of those variables are also associated with group membership. Finally, a multinomial logistic regression model, with the confounding variables as predictors, is used to estimate the propensity score [14] [15]. 18

Inverse propensity score weighting
The weight for an individual is obtained by taking the inverse of the propensity score for the group to which the individual belongs: By applying inverse propensity score weighting a pseudo-population is created in which the covariates and the exposure variable are independent and therefore no confounding is present (a property that is also expected under randomization).
The reason for choosing the inverse of the propensity score as weight for each patients, it that the inverse probability weighted mean equals the counterfactual mean, assuming conditional exchangeability holds [16].
This equality can be described as follows: has value 1 if a subject belongs to group g and has value 0 otherwise, Y is the binary outcome variable, ( ) | P group g X x == gives the probability for a subject to belong to group g given his set of covariates x (= the propensity score) and finally g Y is the counterfactual outcome variable Y (outcome that would have been observed if the subject would have belonged to group g). Assuming conditional exchangeability means in the current study that, given the same set of covariates, patients in different subgroups are exchangeable (the same outcome is expected if group membership should be different). Because of the equality

Disadvantages of inverse propensity score weighting
The use of propensity scores also has some limitations or disadvantages. Firstly, it can only be used to adjust for the observed confounding variables, this in contrast to randomization which tends to balance out the distribution of all observed and unobserved variables. Secondly, if the (multinomial) logistic regression model used to estimate the propensity scores is not completely correct, the results may be biased. Lastly, in some cases very large weights are obtained which can cause an imbalance of the covariates between different groups again. A possible ad hoc solution is trimming the extremely high weights at the 95 th percentile of the weights [19]. 20 4 Discrimination

Indications of discrimination
The original meaning of the word "discrimination" is "making a distinction". Over the course of the years however, the word got a more negative connotation in social and legal context. The meaning of the word changed into "making an unjustified distinction" (i.e. discrimination based on race or religion). In this study, the word "discrimination" is used in its original meaning.
Based on the available information for this study, it is impossible to conclude if clinicians do or do not discriminate between patients based on their age, cancer type, cancer status or surgery type. Additional information is needed (interviews with clinicians, the opinion of anthropologists, …). Based on the available dataset, it is only possible to look for some trends that would be expected if there was discrimination.
Imagine that a subgroup of patients was discriminated (concerning the level of care) by clinicians, then the following two trends should be observed: The proportion of patients who received concordant PECs or the rate of receiving those concordant PECs is significantly higher/lower in the discriminated subgroup than in the other subgroups of patients.
Trend B: After receiving concordant PECs, the patients in the discriminated subgroup are treated differently than the patients in the other subgroups. The proportion of patients with a TLD registration or the rate at which TLDs were registered is significantly higher/lower for the discriminated subgroup of patients than for the other subgroups of patients.
So, if there is discrimination concerning the level of care, trend A and trend B should be observed. However, it is not because trend A and trend B are observed that it can be concluded that there is discrimination. Observing both trends only means that discrimination may be possible.

How to detect indications of discrimination?
In order to check if trend A was observed, the following things were studied: -Differences between subgroups in proportion of patients who received concordant PECs (see univariate analysissection 5) -Unweighted and weighted cumulative incidence curves for the time from admission until receiving concordant PECs (see sections 7 and 9).
In order to check if trend B was observed, the following things were studied: -Differences in risk of TLD registration by the end of the study between subgroups (see section 6.5) -Unweighted and weighted cumulative incidence curves for the time from receiving concordant PECs until TLD registration (see sections 7 and 9).
The cumulative percentage of patients receiving concordant PECs was calculated relative to the total number of patients in the subgroup. The cumulative percentage of patients with a TLD registration was calculated relative to the number of patients with concordant PECs in that subgroup.
As the point of interest lies in the possible difference in care after receiving concordant PECs, patients with a TLD registration before receiving concordant PECs were not considered for the cumulative incidence curves to check for the presence of trend B.
Hazard rates and cumulative incidences for different subgroups were compared by: -visually analysing the cumulative incidence curves, -estimating the cause-specific hazard ratios (based on the Cox proportional hazard model which only included one predictor: the group-indicator variable) and the associated 95 % confidence intervals, -executing a log rank test.

22
Cause-specific hazard ratios are used to compare two groups at the time. Therefore, a reference group within each type of subgroups has been chosen:

Differences between age subgroups
The results of all statistical tests that compare the age subgroups (< 75 year old and ≥ 75 year old) have been presented in App Table 2 and App Table 3. In what follows, a selection of the differences will be discussed.
There are significantly more males present in the group with younger patients than in the group with older patients (60. 6

Differences between cancer type subgroups
The results of all statistical tests that compare the cancer type subgroups (no cancer, hematological cancer, solid tumor) have been presented in App Table 4 and App Table 5. In what follows, a selection of the differences will be discussed.

Differences between cancer status subgroups
The results of all statistical tests that compare the cancer status subgroups (no cancer, active cancer and not active cancer) have been presented in App Table 6 and App Table 7. In what follows, a selection of the differences will be discussed. PECs and that these differences between the three groups were significant (p-value = 0.0002).
The results of the pairwise comparisons have been presented in App

Differences between surgery subgroups
The results of all statistical tests that compare the surgery subgroups (no surgery, scheduled surgery and unscheduled surgery) have been presented in App Table 8 and App Table 9. In what follows, a selection of the differences will be discussed.
For all groups, it appears that patients go to a university hospital rather than to a public hospital. 6 Risk of death within 28 days, death within one year, CEP and TLD

Goal and method
Within the subgroups, a distinction was made between patients who did and did not receive concordant PECs. Fisher's exact tests were applied to investigate whether there was a significant difference between the different subgroups in risk of death within 28 days, risk of death within 1 year, risk of reaching the combined endpoint and risk of TLD registration by the end of the study. It was investigated whether the same trend could be observed between patients with and without concordant PECs. All the percentages mentioned above are expressed relative to the total number of patients in the subgroup (including the patient whose situation after 28 days was unknown). All the percentages mentioned above are expressed relative to the total number of patients in the subgroup (including the patient whose situation after 1 year is unknown). When looking at the patients with concordant PECs, it could not be concluded that there was a significant difference in risk of reaching the combined endpoint between the subgroups (p-values between 0.1164 and 0.8856).

Risk of combined endpoint
All the percentages mentioned above are expressed relative to the total number of patients in the subgroup (including the patient for who it was unknown if they reached the combined endpoint or not).

Remark about patients without concordant PECs
For the patients without concordant PECs, whose level of care is appropriate according in the clinicians, it could be concluded that the mortality (risk of death within 28 days, risk of death  Table 10). The cumulative incidence curves for the time between receiving the 2 nd PEC and TLD for the age subgroups have been presented in Figure 3.  Table 29). It was also observed that most TLDs were registered within 14 days after receiving concordant PECs.
32 Figure 3: Comparison of age subgroups -cumulative incidence curves for time between receiving 2 nd PEC and TLD.

Cancer type subgroups
The cumulative incidence curves for the time from admission until receiving the 2 nd PEC for the cancer type subgroups have been presented in Figure 4.

33
The cumulative incidence curves for the time between receiving the 2 nd PEC and TLD have been presented in Figure 5. By the end of the study, 29.

Cancer status subgroups
The cumulative incidence curves for the time from admission until receiving the 2 nd PEC for the cancer status subgroups have been presented in Figure 6. At the end of the study, 9.  The cumulative incidence curves for the time between receiving the 2 nd PEC and TLD for the cancer status subgroups have been presented in Figure 7.

Surgery subgroups
The cumulative incidence curves for the time from admission until receiving the 2 nd PEC for the surgery subgroups have been presented in Figure 8. At the end of the study, 10  The cumulative incidence curves for the time between receiving the 2 nd PEC and TLD have been presented in Figure 9. First, it needs to be noticed that there is only one patient with a TLD registration in the reference group (patients with scheduled surgery). This is why the 95 % confidence intervals of the cause-specific hazard ratios are very wide. Due to this issue, the focus lies on the visual interpretation of the cumulative incidence curves. The steepness of the cumulative incidence curves for patients without surgery and patients with unscheduled surgery seems similar which may be an indication that the rate of TLD registration may be similar as well. At the end of the study, only 10.0 % of the patients with scheduled surgery had a TLD registration, this in contrast with 27.9 % for patients without surgery and 32.4 % for patients with unscheduled surgery (see App Table 32).
36 Figure 9: Comparison of surgery subgroups -cumulative incidence curves for time between receiving 2 nd PEC and TLD.

First indication of discrimination?
An overview of the observed differences between the subgroups in the proportion of patients with concordant PECs, the proportion of patients with a TLD registration, the cause-specific hazard rate of receiving concordant PECs and the cause-specific hazard rate of TLD registrations (which have been discussed in previous sections) is shown in Table 3.
Although there is a significant difference between some subgroups in the proportion of patients with concordant PECS or in the cause-specific hazard rate of receiving those concordant PECs, there was never a significant difference observed in the proportion of patients with a TLD registration or in the cause-specific hazard rate of those TLD registrations. Therefore, it can be concluded that the observations do not point towards discrimination (trend A was observed, but not trend Bsee section 4.1). As there were no competing events for dying or for reaching the combined endpoint, an ordinary Cox proportional hazard model was used (instead of a cause-specific model) to compare hazard rates between the different subgroups. As there is no information available about the cause of death of the patients (whether they died because there was a unofficial treatment limitation decision or whether they died due to their underlying disease trajectory), possible differences in mortality rate between different subgroups cannot be considered as an indication of discrimination.

Age subgroups
The cumulative incidence curves for the time between receiving the 2 nd PEC and death or CEP for the age subgroups have been presented in Figure 10.

Cancer type subgroups
The cumulative incidence curves for the time between receiving the 2 nd PEC and death or CEP for the cancer type subgroups have been presented in Figure 11. After one year, 86.

Cancer status subgroups
The cumulative incidence curves for the time between receiving the 2nd PEC and death or CEP for the cancer status subgroups have been presented in Figure 12.

Surgery subgroups
The cumulative incidence curves for the time between receiving the 2 nd PEC and death or CEP for the surgery subgroups have been presented in Figure 13.

Goal and method
As was explained in section 3.2, inverse propensity score weighting was applied in order to correct for systematic differences in background characteristics between the groups (confounding). The following steps need to be followed in order to obtain the weights for each patient for the age subgroups: 1. Find the variables that are related to the outcome variable CEP.
2. Find out which of those variables are also related with age.

Calculate the weights.
These steps will be further explained in following sections.

Variables related with CEP
The variable CEP was converted to a binary variable with value 1 if the patient was dead, not at home or had a poor QOL and value 0 if the patient was still alive, at home and had a good QOL after one year. The 339 patients for who the situation after one year was unknown, were not taken into consideration.
The first step towards obtaining the weight for each patient, was to look for the variables that are related with the binary variable CEP. This is done by executing a forward, backward and both-way stepwise logistic regression (significance level set to 0.10) with CEP as the outcome variable and patient, ICU and hospital characteristics as predictors. Only the variables that contain information about the patient known at the moment of admission at the ICU have been considered. Five of these variables contained 125 missing values and were therefore not included. All stepwise procedures lead to the same model, which contains 16 variables. An overview of these variables, the coefficient estimates and the p-values indicating the significance is presented in Table 4.

Variables related with age
The variable age is converted to a binary variable with the value 1 for patients younger than 75 years old and the variable 0 for patients that are 75 years or older. The next step towards obtaining the weight for each patient was to find out which of the variables related with CEP (see Table 4) and all possible two-way interactions between those variables, are also related with the binary variable age. This is done by applying a both-ways stepwise multinomial logistic regression (significance level set to 0.20). This resulted in a model that contains 17 variables and two-way interactions. Although the gender of the patient, the geographical region of the hospital and the ethical climate of the ICU didn't seem to be significantly related with CEP, it does seem important to balance out their effect as well. Therefore, they are also included. An overview of the selected variables and interactions, their coefficient estimates and p-values of significance are presented in Table 5.

Propensity scores and weights for age
The next step was to use the multinomial logistic regression model to predict the probability that a patient is younger than 75 years old, given his set of covariates. All variables presented in Table 5 were used as predictors in this model. The weight for a patient is the inverse of the probability to belong to their own age group (inverse of the propensity score): Weight for a patient younger than 75 years: Weight for a patient of 75 years or older:

Propensity scores and weights for cancer type, cancer status and surgery
In the previous section it was explained how the propensity scores and weights for the age subgroups were obtained. A similar method wa used to calculate the weights when the cancer type, cancer status and surgery subgroups are considered. The main difference is that the patients are now divided into three groups instead of two. The three groups are referred to as group A, group B and group C to keep the explanation general.
Multinomial logistic regression models were built in a similar way as explained in section 8.4.
The final models were used to estimate three propensity scores for each patient (estimated probability to belong to group A, group B and group C, given his set of covariates). The single weight for each patient could then be calculated as follows: Weight for a patient in group A: Weight for a patient in group B: Weight for a patient in group C: Trimming of the weights at the 95 th percentile was again applied to avoid extremely large weights that may have a big influence on the results. 46 9 Weighted cumulative incidence curves

Calculation of cumulative incidence
In section 8 it was explained how the weights for age, cancer status, cancer type and surgery were obtained for each patient. An overview of the ranges of the weights has been presented in App Table 34. The next step is to use these weights to construct the weighted cumulative

Age subgroups
The weighted cumulative incidence curves for the time from admission until receiving the 2 nd PEC for the age subgroups have been presented in Figure 14. If every young patient would be replaced by an older patient with the same characteristics (same values for the predictors mentioned in Table 5

47
The weighted cumulative incidence curves for the time between receiving the 2 nd PEC and TLD have been presented in Figure 15. These results coincide with the results that were found based on the unweighted cumulative incidence curves.

Cancer type subgroups
The weighted cumulative incidence curves for the time from admission until receiving the 2 nd PEC for the cancer type subgroups have been presented in Figure 16. If all patients had hematological cancer, 14.6 % would have received concordant PECs, this in contrast with only 9.2 % if no one had cancer and 11.6 % if all patients had a solid tumor. As the proportional hazard assumption was not met (see App Figure 16), Aalen's additive model was fitted instead.
It could be concluded that the rate of receiving concordant PECs is significantly higher for patients with hematological cancer than for patients without cancer (p-value < 0.001). The weighted cumulative incidence curves for the time between receiving the 2 nd PEC and TLD for the cancer type subgroups have been presented in Figure 17. These results also coincide with the results that were found based on the unweighted cumulative incidence curves.

Cancer status subgroups
The weighted cumulative incidence curves for the time from admission until receiving the 2 nd PEC for the cancer status subgroups have been presented in Figure 18. If all patients had active cancer, 20.4 % of them would have received concordant PECs by the end of the study.
This in contrast with 8.9 % if none of the patients had cancer and 9.2 % if all patients had not active cancer. As the proportional hazard assumption was not met (see App Figure 19), Aalen's additive model was fitted instead. No significant difference in rate if receiving concordant PECs was detected (p-values = 0.088 and 0.234). However, when visually analysing the cumulative incidence curves, it appears that patients with active cancer receive concordant PECs more rapidly than patients without cancer or patients with not active cancer (which was also concluded based on the unweighted cumulative incidence curves). The weighted cumulative incidence curves for the time between receiving the 2 nd PEC and TLD has been presented in Figure 19. These results coincide with the results that were found based on the unweighted cumulative incidence curves.
50 Figure 19: Comparison of cancer status subgroups -weighted cumulative incidence curves for time between receiving 2 nd PEC and TLD.

Surgery subgroups
The weighted cumulative incidence curves for the time from admission until receiving the 2 nd PEC for the surgery subgroups have been presented in Figure 20. At the end of the study,  These results coincide with the results that were found based on the unweighted cumulative incidence curves.

First indication of discrimination?
For the age and cancer type, a significant difference was detected in the cause-specific hazard rate of receiving concordant PECs, but no significant difference was observed in the causespecific hazard rate of TLD registration. For the cancer status and surgery subgroups, no significant difference in cause-specific hazard rate of receiving PECs and of TLD registration was detected. Therefore, it can be concluded that the observations do not point towards discrimination (trend A was observed, but not trend Bsee section 4.1). 52 9.7 Difference in mortality rate between subgroups 9.7.1 Age subgroups The weighted cumulative incidence curves for the time between receiving the 2 nd PEC and death or CEP for the age subgroups have been presented in Figure 22. If every young patient would have been replaced by an older patient with the same characteristics (same values for the predictors mentioned in . These results coincide with the results that were found based on the unweighted cumulative incidence curves.

Cancer type subgroups
The weighted cumulative incidence curves for the time between receiving the 2 nd PEC and death or CEP for the cancer type subgroups have been presented in Figure 23. which subgroup is considered. These results coincide with the results that were found based on the unweighted cumulative incidence curves. Figure 23: Comparison of cancer type subgroups -weighted cumulative incidence curves for time between receiving 2 nd PEC death or CEP.

Cancer status subgroups
The weighted cumulative incidence curves for the time between receiving the 2 nd PEC and death or CEP for the cancer type subgroups have been presented in Figure 24. . These results coincide with the results that were found based on the unweighted cumulative incidence curves.
54 Figure 24: Comparison of cancer status subgroups -weighted cumulative incidence curves for time between receiving 2 nd PEC death or CEP.

Surgery subgroups
The weighted cumulative incidence curves for the time between receiving the 2 nd PEC and death or CEP for the surgery subgroups have been presented in Figure   When studying the mortality (risk of death within 28 days, risk of death within 1 year and risk of reaching CEP) of patients with concordant PECs, no significant difference could be detected between the age, cancer type and cancer status subgroups. For the surgery subgroups, the risk of death within 28 days and death within 1 year was lower for patients with a scheduled surgery.
When looking at the mortality of patients without concordant PECs (whose clinicians think the level of care is appropriate), it appeared that the mortality is significantly higher for older patients, for patients with hematological cancer, for patients with active cancer and for patients without surgery. This may be an indication that more patients in those subgroups should have received concordant PECs and the level of care may not have been appropriate.

56
Cumulative incidence curves were constructed for all subgroups for 1) the time from admission until receiving the 2 nd PEC and 2) the time from receiving the 2 nd PEC until treatment limitation decision (TLD) registration. By fitting cause-specific hazard models, hazard rates of different subgroups could be compared. Cumulative incidence curves for the time from receiving the 2 nd PEC until death or combined endpoint were constructed as well (although these do not give extra information about possible discrimination). To adjust for background characteristics, inverse propensity score weighting was applied and weighted cumulative incidence curves were constructed. The propensity score was defined as the estimated conditional probability for a patient to belong to his own subgroup given the patient's characteristics.
Following things could be concluded based on the unweighted as well as on the weighted cumulative incidence curves. No significant difference in cause-specific hazard rate of TLD registration was detected between any of the subgroups. The cause-specific hazard rate of receiving concordant PECs was significantly higher for older patients and for patients with hematological cancer in comparison with the patients in the other subgroups. No significant difference in cause-specific hazard rate of receiving concordant PECs between the surgery subgroups was detected. Based on the unweighted cumulative incidence curves, the causespecific hazard rate of receiving concordant PECs was significantly higher for patients with active cancer than for patients with not active cancer or patients without cancer. However, this difference was not detected based on the weighted cumulative incidence curves. Overall, it could be concluded that the observations do not point towards discrimination of patients based on age, cancer type, cancer status of surgery type.
One of the biggest limitations of this study is that no definite conclusion about discrimination can be formulated. Additional information will be necessary. In the current dataset, there is, for example, no information about cause of death of patients. It is unknown whether patients have died due to an unofficial treatment limitation decision or due to their underlying disease trajectory. If this information would be available, differences in the mortality rate between subgroups may also be seen as an indication of discrimination.
Besides that, additional qualitative information (i.e. by interviewing clinicians, by asking the opinions of anthropologists…) would also be useful. It should also be mentioned that the number of patients with concordant PECs is probably underestimated as patients who were admitted prior to the study period and patients who remained in the ICU after the end of the study (and could have received concordant PECs during their unobserved ICU-stay) were 57 excluded from the analysis [2]. Another issue in this study was the reasonable amount of missing data (i.e. 339 out of the 1641 patients for whom it is unknown whether they reached the combined endpoint or not).
Finally, as no clear discriminatory attitude of clinicians was detected in this study, it is for further research (based on the same data) not necessary to make a distinction between the different types of subgroups.

A.1 Selection bias in average (-) ethical climate
App

A.2.2 Comparison of the number of PECs
App Table 10: Differences in number of PECs between age subgroups.
App Table 11: Differences in number of PECs between cancer type subgroups. App

A.3.1 Risk of death within 28 days
App

A.3.2 Risk of death within 1 year
App

A.4.1 Summary
All hazard ratios mentioned in sections 7 and 9 were obtained via the (cause-specific) Cox proportional hazard models that were fitted to the data. The proportional hazard assumptions have been checked again by plotting the Schoenfeld residuals in function of the time (see App Figure 1 to App Figure 12 for the unweighted cases and App Figure 13 to App Figure 24 for the weighted cases). For the unweighted model to estimate the cause-specific hazard rate or receiving concordant PECs for the cancer type subgroups and for the weighted models to estimate the cause-specific hazard rate or receiving concordant PECs for the cancer type and cancer status subgroups, the Schoenfeld residuals did vary in function of the time and the proportional hazard assumption didn't hold (p-values were 0.0066, 0.0105 and 0.0156 respectively). In all other cases the proportional hazard assumption did hold (p-values between 0.0502 and 0.9292).

A.4.2 Age subgroups
App Figure 1: Schoenfeld residuals for age-coefficient for time from admission until receiving 2 nd PEC.
App Figure 2: Schoenfeld residuals for age-coefficient for time between receiving 2 nd PEC and death or CEP.

A.5.1 Goodness-of-fit tests for multinomial regression models
As was explained in section 8, multinomial logistic regression models have been built to estimate the propensity scores for all patients. The goodness of fit of these models has been assesed by the "Le Cessie-van Houwelingen-Copas-Hosmer" test. The resulting p-values of the different tests have been presented in App Table 33. No evidence for a lack of fit for any of the models has been detected. App

A.5.2 Range of weights
App