Hallo,

my topic is rather of econometric nature than a pure Stata question:
When applying regression models in multiple imputation (e.g. using Stata's mi impute functions), is endogeneity in the covariates used for imputation that much of an issue?

The logic I think of is as follows:
Endogeneity results - beyond doubt - in biased coefficients. This is indeed a problem if we are interested in causal effect of variable x on y, thus the actual population parameters.
However, when trying to impute a variable y using a set of covariates x_i when some of them are not strictly exogenous, it should basically be correlation and not causality which helps us to impute y, right?
In order to get a good prediction y_hat, the direction of causality should at least be of secondary interest. If we say for instance a high y comes with a high x (no matter the direction of the effect), observing a high x should still give us a hint that we expect a high y. I could not find adequate literature on that topic. Endogeneity could indeed cause problems if y is in turn used for further causal analysis on other variables (suppose e.g. 2SLS models), but just for some insights on y it should be ok I think.

I would love to hear some opinions on that topic and I might indeed be wrong here...