Dear all,

First of all, I would like to confirm that I have searched and read many posts here but no extant solution could be found.

I am now working with xtabond2 to conduct two-step sys-GMM estimation. I have read Roodman (2009) and Prof. Sebastian Kripfganz's presentation slides. But my case is a bit uncommon, so I still cannot figure out all the issues by exploring these materials.

To clarify, I do not have a lagged dependent variable in the right-side equation. The reason I run GMM estimation is because for the purpose of robustness check, I have to address endogeneity while I cannot find proper external instrument variables.

My observations in total are more than 600,000 with a time span of 22 years. My core predictor is a macro-level variable (i.e. yearly difference △Xt, △Xt-1, △Xt-2, etc.) and the dependent variable is a micro-level variable (i.e. individual choice). In my OLS & fixed-effect model, I find a U-shaped relationship (convexity), so I want to add the square term of my core predictor to the GMM estimation. But by specifying it as GMM-style instruments, the Hansen test is always significant (fairly below 0.25, just around 0.01 most of time). I tried all the positions it could be placed in, and have found that by treating it as exogenous and putting it in the IV-style instrument, I obtain statistically significant results and a decent Hansen test p-value (>0.40).

1. My first confusion is, I treat the core predictor as endogenous, and put it in the GMM-style instrument with its second- and higher-orders (lag2-lag21). In this way, can I treat its square term as exogenous?

2. Arellano-Bond test rejects the null until AR(6), is it still okay for me to include lags of 1-5 as instruments? Since I don't have lagged dependent variable in the model, so I am unsure whether Arellano-Bond test still applies to my case.

3. From Prof. Sebastian Kripfganz's slides, I learn that dummy variables are usually treated as exogenous and put in the IV-style instrument with the level option, but how about the interaction term between endogenous / predetermined variables and dummies? If Hansen test and Difference-in-Hansen tests are all satisfied (fairly >0.25), is it justifiable to treat the interaction terms as exogenous?

Lastly, I have run my specification with xtdpdgmm command before, but due to the number of my observations is quite large, I cannot obtain the result even after waiting for more than 30 minutes. Is there any way that I can speed up running xtdpdgmm?

Hereby, I leave my codes:
Code:
xtabond2 migrate i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
c.L.gap_jobdiff3ex##c.L.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu i.yr2-yr22 , ///
gmmstyle(gap_jobdiff3ex, lag(2 .) orthogonal collapse) ///
gmmstyle(gap_ppden gap_enterprise gap_unemploy , lag(1 .) collapse) ///
ivstyle(gap_highedu gap_med) ///
ivstyle(c.L.gap_jobdiff3ex#c.L.gap_jobdiff3ex i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome i.yr2-yr22 , eq(level)) ///
small twostep artests(6) cluster(dest_code)
Note: i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome are time-invariant variables. I confirm that I realize that to include them, a stronger assumption is imposed on the estimation.

Thanks for any comments!