I'm trying a fixed effects regression with panel data where I cluster across two dimensions. Following the discussion here, I tried two-way clustering in the two ways and got non-identical results. This has left me confused as to which method is the correct one. For demonstration, I use a sample data given by
Code:
use "http://www.stata-press.com/data/r14/nlswork.dta", clear
Code:
use "http://www.stata-press.com/data/r14/nlswork.dta", clear
reghdfe ln_wage hours age i.race i.year,a(idcode) vce (cluster occ_code year)
(dropped 548 singleton observations)
note: 2bn.race is probably collinear with the fixed effects (all partialled-out values are close to zero; tol =
> 1.0e-09)
note: 3bn.race is probably collinear with the fixed effects (all partialled-out values are close to zero; tol =
> 1.0e-09)
(MWFE estimator converged in 1 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
warning: missing F statistic; dropped variables due to collinearity or too few clusters
note: 2.race omitted because of collinearity
note: 3.race omitted because of collinearity
HDFE Linear regression Number of obs = 27,775
Absorbing 1 HDFE group F( 16, 12) = .
Statistics robust to heteroskedasticity Prob > F = .
R-squared = 0.6555
Adj R-squared = 0.5948
Number of clusters (occ_code) = 13 Within R-sq. = 0.1066
Number of clusters (year) = 15 Root MSE = 0.3031
(Std. Err. adjusted for 13 clusters in occ_code year)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hours | .0007912 .0013155 0.60 0.559 -.002075 .0036573
age | .0123773 .008567 1.44 0.174 -.0062886 .0310433
|
race |
black | 0 7.23e-10 0.00 1.000 -1.57e-09 1.57e-09
other | 0 1.85e-09 0.00 1.000 -4.04e-09 4.04e-09
|
year |
69 | .074031 .0289008 2.56 0.025 .0110616 .1370005
70 | .0475113 .0335979 1.41 0.183 -.0256922 .1207148
71 | .0867244 .0357381 2.43 0.032 .0088578 .1645911
72 | .085603 .0452419 1.89 0.083 -.0129706 .1841766
73 | .0889179 .0482453 1.84 0.090 -.0161995 .1940353
75 | .0790484 .0628044 1.26 0.232 -.0577907 .2158875
77 | .110587 .0820362 1.35 0.203 -.0681546 .2893286
78 | .133743 .0908718 1.47 0.167 -.0642497 .3317356
80 | .1167207 .097275 1.20 0.253 -.0952233 .3286646
82 | .1137017 .1244713 0.91 0.379 -.1574979 .3849013
83 | .1261455 .1254465 1.01 0.334 -.1471791 .39947
85 | .151498 .1362975 1.11 0.288 -.1454688 .4484649
87 | .1422716 .1548657 0.92 0.376 -.1951518 .479695
88 | .1837026 .1757012 1.05 0.316 -.1991175 .5665227
|
_cons | 1.181475 .1358746 8.70 0.000 .8854294 1.47752
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
idcode | 4149 0 4149 |
-----------------------------------------------------+
clearMethod 2:
Code:
use "http://www.stata-press.com/data/r14/nlswork.dta", clear
egen occ_year=group(occ_code year)
areg ln_wage hours age i.race i.year,a(idcode) cluster(occ_year)
note: 2.race omitted because of collinearity
note: 3.race omitted because of collinearity
Linear regression, absorbing indicators Number of obs = 28,323
F( 16, 175) = 17.61
Prob > F = 0.0000
R-squared = 0.6648
Adj R-squared = 0.5980
Root MSE = 0.3031
(Std. Err. adjusted for 176 clusters in occ_year)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hours | .0007912 .0005501 1.44 0.152 -.0002944 .0018768
age | .0123773 .0152226 0.81 0.417 -.0176661 .0424208
|
race |
black | 0 (omitted)
other | 0 (omitted)
|
year |
69 | .074031 .0427223 1.73 0.085 -.0102861 .1583482
70 | .0475113 .05194 0.91 0.362 -.0549982 .1500208
71 | .0867244 .0607844 1.43 0.155 -.0332405 .2066893
72 | .085603 .0725549 1.18 0.240 -.0575924 .2287983
73 | .0889179 .0877736 1.01 0.312 -.0843132 .262149
75 | .0790484 .113218 0.70 0.486 -.1444001 .3024969
77 | .110587 .1410838 0.78 0.434 -.1678578 .3890318
78 | .133743 .1569636 0.85 0.395 -.1760424 .4435283
80 | .1167207 .1853747 0.63 0.530 -.2491371 .4825784
82 | .1137017 .2147943 0.53 0.597 -.310219 .5376224
83 | .1261455 .2315117 0.54 0.587 -.3307688 .5830597
85 | .151498 .2606477 0.58 0.562 -.3629195 .6659155
87 | .1422716 .2896411 0.49 0.624 -.4293677 .7139108
88 | .1837026 .2959214 0.62 0.536 -.4003315 .7677367
|
_cons | 1.179466 .2917784 4.04 0.000 .6036086 1.755324
-------------+----------------------------------------------------------------
idcode | absorbed (4697 categories)
clearCan anyone suggest which amongst these two is the correct way to do two-way clustering and why?
Thanks,
0 Response to 2 way clustering in fixed effects panel data regression
Post a Comment