Hi all,

On p. 296 of the Stata 16 MI manual, the following is stated:

The MI predictions should be treated as a final result; they should not be used as intermediate results in computations. For example, MI estimates of the linear predictor cannot be used to compute
residuals as is done in non-MI analysis. Instead, completed-data residuals should be calculated for each imputed dataset, and these can be obtained by using the mi xeq: command. For example,
. mi xeq: regress : : : ; predict resid, r
I took this instruction as accurate when generating random effects from a 'mixed' model. Which essentially required me to run a full set of models over the mi data again. After first running it within the mi estimate command. This took a LONG time.

It got me thinking as to whether the Stata instruction is correct.

If the dependent variable is not imputed, and so it is constant across all imputation sets, my understanding is that the correct residual can be generated by simply using the data within _mi_m = 0.

And that the approach in the Stata manual would only be required if:
  • you are needing to do further analysis using residuals, where the final result requires separate estimates within each imputation. e.g. I needed to generate level 2 average residuals in my 2 level mixed model.
or
  • the dependent variable is imputed; and
  • you wanted to generate a standard error for the residual.
In the case that the dependent variable is not imputed, the standard error of the residual would be the same as that for the predicted xb value.
With this standard error inclusive of the imputation error and the 'model error'.

Do people agree? So could be language in the mi manual be slightly reworded?

Regards,

Andrew