Dear All,

Recently I have an exactly the same question with an old faq on Stata about Durbin–Wu–Hausman test.
https://www.stata.com/support/faqs/s...-hausman-test/

I just copy and paste the faq here.
__________________________________________________ __________________________________________________ ________________________
Before estimating the following simultaneous equations,
z = a0 + a1*x1 + a2*x2 + epsilon1 (1) y = b0 + b1*z + b2*x3 + epsilon2 (2)
one should decide whether it is necessary to use an instrumental variable, i.e., whether a set of estimates obtained by least squares is consistent or not.

Davidson and MacKinnon (1993) suggest an augmented regression test (DWH test), which can easily be formed by including the residuals of each endogenous right-hand-side variable, as a function of all exogenous variables, in a regression of the original model. Back to our example, we would first perform a regression
z = c0 + c1*x1 + c2*x2 + c3*x3 + epsilon3 (3)
get residuals z_res, then perform an augmented regression:
y = d0 + d1*z + d2*x3 + d3*z_res + epsilon4 (4)
If d3 is significantly different from zero, then OLS is not consistent.
__________________________________________________ __________________________________________________ ___________________

My question is, the normal Durbin–Wu–Hausman needs a declaration of IV for z. In this case, it must be the x1 and x2. However, in my case, the z is a generated regressor and x1 x2 are a list of long variables even with tons of dummies, like equation (1).

When I try to test the endogeneity of z in equation (2), do I need to prove x1x2 are all uncorrelated with epsilon2 (it is the definition of IV) or just do as the posted faq suggested?



best,
Zhaohui