I'd like to manually estimate a two-stage least squares regression, first running the first stage then running the second stage with the predicted X. Doing this, the standard errors need to be adjusted to account for predicted X being a simulated regressor.
I did so using code from a Stata.com post and a Kit Baum Statalist post. With clustering, however, the degrees of freedom adjustment isn't quite right and I can't figure out how to do it. I can get "close" but my manual SE's still don't match those from ivregress (although they do when I don't cluster).
Can someone correct my SE adjustment so that the manual two-stage can recover the SE's from ivregress?
Below is code to generate an illustrative dataset, to generate the accurate correction without clustering (the code is very similar to that from the links above but slightly more automated), to get an "almost accurate" correction with clustering, and to do the whole thing with reghdfe in addition to reg (which has to be done slightly differently because of differences in how predict works; I'm including it because I thought it might be useful to others).
Thank you,
Mitch
Code:
set seed 30819 ****************** /* DATA SETUP */ ****************** clear * Generate an unbalanced panel of 1000 observations set obs 1000 gen n = _n gen i = ceil(100*runiform()) sort i n by i: gen t = _n *tab i, sort *tab t * Create need for i fixed effects gen temp1 = rnormal() egen a = mean(temp1), by(i) drop temp1 * Create need for t fixed effects gen temp2 = t + rnormal() egen d = mean(temp2), by(t) drop temp2 gen u1 = rnormal() gen u2 = rnormal()*2 gen v = rnormal() gen z = rnormal() gen x = a + d + z/3 + u1 + v gen y = a + d + x/2 + u1 + u2 ********************************** /* REG VERSION WITH DUMMIES */ /* NO CLUSTERING */ ********************************** * First stage qui: reg x z i.i i.t predict xfit, xb * Second stage qui: reg y xfit i.i i.t local st2se = _se[xfit] local st2rmse = `e(rmse)' di `e(df_r)' local dfr = `e(df_r)' di `dfr' * Getting "corrected" residuals as true X's and IV-estimated coefficients replace xfit = x predict cst2e, resid * Getting "corrected" sum of squared errors gen cst2e2 = cst2e^2 qui: sum cst2e2 local csse = `r(sum)' di `csse' * Original SE's with no correction for simulated variables di `st2se' * Actual IV standard errors qui: ivregress 2sls y i.i i.t (x = z) di _se[x] * Manually calculated/adjusted SE's di `st2se'*(sqrt(`csse'/`dfr')/`st2rmse') * Actual IV standard errors with small sample correction qui: ivregress 2sls y i.i i.t (x = z), small di _se[x] drop xfit cst2e cst2e2 ********************************** /* REG VERSION WITH DUMMIES */ /* WITH CLUSTERING */ ********************************** * Note: Clustering matters qui: reg x z i.i i.t di _se[z] qui: reg x z i.i i.t, cluster(i) di _se[z] * First stage qui: reg x z i.i i.t, cluster(i) predict xfit, xb * Second stage qui: reg y xfit i.i i.t, cluster(i) local st2se = _se[xfit] local st2rmse = `e(rmse)' di `e(df_r)' local dfr = (`e(N)' - `e(df_m)' - `e(df_r)') di `dfr' * Getting "corrected" residuals as true X's and IV-estimated coefficients replace xfit = x predict cst2e, resid * Getting "corrected" sum of squared errors gen cst2e2 = cst2e^2 qui: sum cst2e2 local csse = `r(sum)' di `csse' * Original SE's with no correction for simulated variables di `st2se' * Actual IV standard errors qui: ivregress 2sls y i.i i.t (x = z), cluster(i) di _se[x] * Manually calculated/adjusted SE's di `st2se'*(sqrt(`csse'/`dfr')/`st2rmse') * Actual IV standard errors with small sample correction qui: ivregress 2sls y i.i i.t (x = z), cluster(i) small di _se[x] drop xfit cst2e cst2e2 ********************** /* REGHDFE VERSION */ /* NO CLUSTERING */ ********************** * First stage qui: reghdfe x z, absorb(i t, savefe) resid predict xfit, xbd * Second stage qui: reghdfe y xfit, absorb(i t, savefe) resid predict st2e, resid local st2se = _se[xfit] local st2rmse = `e(rmse)' local dfr = `e(df_r)' * Getting "corrected" residuals as true X's and IV-estimated coefficients * You have to do this in a strange way because predict doesn't work if you change the X values gen cst2e = st2e + (xfit - x)*_b[xfit] * Getting "corrected" sum of squared errors gen cst2e2 = cst2e^2 qui: sum cst2e2 local csse = `r(sum)' * Original SE's with no correction for simulated variables di `st2se' * Actual IV standard errors qui: ivreghdfe y (x = z), absorb(i t) di _se[x] * Manually calculated/adjusted SE's di `st2se'*(sqrt(`csse'/`dfr')/`st2rmse') drop xfit st2e cst2e cst2e2 ********************** /* REGHDFE VERSION */ /* WITH CLUSTERING */ ********************** * First stage qui: reghdfe x z, absorb(i t, savefe) resid cluster(i) predict xfit, xbd * Second stage qui: reghdfe y xfit, absorb(i t, savefe) resid cluster(i) predict st2e, resid local st2se = _se[xfit] local st2rmse = `e(rmse)' local dfr = (`e(N)' - `e(df_m)' - `e(df_r)') * Getting "corrected" residuals as true X's and IV-estimated coefficients * You have to do this in a strange way because predict doesn't work if you change the X values gen cst2e = st2e + (xfit - x)*_b[xfit] * Getting "corrected" sum of squared errors gen cst2e2 = cst2e^2 qui: sum cst2e2 local csse = `r(sum)' * Original SE's with no correction for simulated variables di `st2se' * Actual IV standard errors qui: ivreghdfe y (x = z), absorb(i t) cluster(i) di _se[x] * Manually calculated/adjusted SE's di `st2se'*(sqrt(`csse'/`dfr')/`st2rmse')
0 Response to Manual SE adjustment for 2SLS with clustering
Post a Comment