Code:
stcox i.chemotherapy##c.risk_score
The problem arises because clinicians are prone to using this risk score by categorizing it into low, medium, and high based on certain cutoffs. These cutoffs do not have a strong scientific basis, but have become entrenched in clinical practice. So, our target audience would prefer to see our results using the categorized version.
Code:
stcox i.chemotherapy##i.risk_category
The problem arises when we then do this analysis with our multiply imputed data. Here's the problem. As you might imagine, the probability of recurrence is very low in the low level of risk_category. It is even lower still when chemotherapy has been done. While our data set is modest in size (N = 920), the number of non-censored observations where the risk category is low and there has been chemotherapy drops to zero in several of the imputations. (In the ones where it doesn't drop to zero it ranges between 1 and about 7). In those imputations that have this "empty cell," the regression coefficient becomes, in effect, negative infinity. (Actual numerical values are more like -1050, but you get the idea.) When these are averaged in with Rubin's rules, the multiply imputed regression coefficient is also, in effect, negative infinity.
I am sorely tempted to simply exclude those imputations that have a zero in that cell from the analysis. But really, that's unprincipled, and I'm searching for a better way. I thought of perhaps going Bayesian to regularize things using an informative prior. But my audience is not fond of Bayesian statistics. (And I've never tried to use -mi estimate- with the Bayesian commands. Can that even be done?) Is there a penalized-maximum likeilhood estimator for these models, and one that runs with -mi estimate-? Any other ideas?
0 Response to Empty cells in some imputations of multiply imputed data
Post a Comment