I'm trying a fixed effects regression with panel data where I cluster across two dimensions. Following the discussion here, I tried two-way clustering in the two ways and got non-identical results. This has left me confused as to which method is the correct one. For demonstration, I use a sample data given by
Code:
use "http://www.stata-press.com/data/r14/nlswork.dta", clear
Code:
use "http://www.stata-press.com/data/r14/nlswork.dta", clear reghdfe ln_wage hours age i.race i.year,a(idcode) vce (cluster occ_code year) (dropped 548 singleton observations) note: 2bn.race is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = > 1.0e-09) note: 3bn.race is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = > 1.0e-09) (MWFE estimator converged in 1 iterations) Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied. warning: missing F statistic; dropped variables due to collinearity or too few clusters note: 2.race omitted because of collinearity note: 3.race omitted because of collinearity HDFE Linear regression Number of obs = 27,775 Absorbing 1 HDFE group F( 16, 12) = . Statistics robust to heteroskedasticity Prob > F = . R-squared = 0.6555 Adj R-squared = 0.5948 Number of clusters (occ_code) = 13 Within R-sq. = 0.1066 Number of clusters (year) = 15 Root MSE = 0.3031 (Std. Err. adjusted for 13 clusters in occ_code year) ------------------------------------------------------------------------------ | Robust ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hours | .0007912 .0013155 0.60 0.559 -.002075 .0036573 age | .0123773 .008567 1.44 0.174 -.0062886 .0310433 | race | black | 0 7.23e-10 0.00 1.000 -1.57e-09 1.57e-09 other | 0 1.85e-09 0.00 1.000 -4.04e-09 4.04e-09 | year | 69 | .074031 .0289008 2.56 0.025 .0110616 .1370005 70 | .0475113 .0335979 1.41 0.183 -.0256922 .1207148 71 | .0867244 .0357381 2.43 0.032 .0088578 .1645911 72 | .085603 .0452419 1.89 0.083 -.0129706 .1841766 73 | .0889179 .0482453 1.84 0.090 -.0161995 .1940353 75 | .0790484 .0628044 1.26 0.232 -.0577907 .2158875 77 | .110587 .0820362 1.35 0.203 -.0681546 .2893286 78 | .133743 .0908718 1.47 0.167 -.0642497 .3317356 80 | .1167207 .097275 1.20 0.253 -.0952233 .3286646 82 | .1137017 .1244713 0.91 0.379 -.1574979 .3849013 83 | .1261455 .1254465 1.01 0.334 -.1471791 .39947 85 | .151498 .1362975 1.11 0.288 -.1454688 .4484649 87 | .1422716 .1548657 0.92 0.376 -.1951518 .479695 88 | .1837026 .1757012 1.05 0.316 -.1991175 .5665227 | _cons | 1.181475 .1358746 8.70 0.000 .8854294 1.47752 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| idcode | 4149 0 4149 | -----------------------------------------------------+ clear
Method 2:
Code:
use "http://www.stata-press.com/data/r14/nlswork.dta", clear egen occ_year=group(occ_code year) areg ln_wage hours age i.race i.year,a(idcode) cluster(occ_year) note: 2.race omitted because of collinearity note: 3.race omitted because of collinearity Linear regression, absorbing indicators Number of obs = 28,323 F( 16, 175) = 17.61 Prob > F = 0.0000 R-squared = 0.6648 Adj R-squared = 0.5980 Root MSE = 0.3031 (Std. Err. adjusted for 176 clusters in occ_year) ------------------------------------------------------------------------------ | Robust ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hours | .0007912 .0005501 1.44 0.152 -.0002944 .0018768 age | .0123773 .0152226 0.81 0.417 -.0176661 .0424208 | race | black | 0 (omitted) other | 0 (omitted) | year | 69 | .074031 .0427223 1.73 0.085 -.0102861 .1583482 70 | .0475113 .05194 0.91 0.362 -.0549982 .1500208 71 | .0867244 .0607844 1.43 0.155 -.0332405 .2066893 72 | .085603 .0725549 1.18 0.240 -.0575924 .2287983 73 | .0889179 .0877736 1.01 0.312 -.0843132 .262149 75 | .0790484 .113218 0.70 0.486 -.1444001 .3024969 77 | .110587 .1410838 0.78 0.434 -.1678578 .3890318 78 | .133743 .1569636 0.85 0.395 -.1760424 .4435283 80 | .1167207 .1853747 0.63 0.530 -.2491371 .4825784 82 | .1137017 .2147943 0.53 0.597 -.310219 .5376224 83 | .1261455 .2315117 0.54 0.587 -.3307688 .5830597 85 | .151498 .2606477 0.58 0.562 -.3629195 .6659155 87 | .1422716 .2896411 0.49 0.624 -.4293677 .7139108 88 | .1837026 .2959214 0.62 0.536 -.4003315 .7677367 | _cons | 1.179466 .2917784 4.04 0.000 .6036086 1.755324 -------------+---------------------------------------------------------------- idcode | absorbed (4697 categories) clear
Can anyone suggest which amongst these two is the correct way to do two-way clustering and why?
Thanks,
0 Response to 2 way clustering in fixed effects panel data regression
Post a Comment