First, a heartfelt advance New Year Wishes to All. I wish all a prosperous New Year
I am dealing with a cross-country dataset, in which the lowest units are firms. I have an agglomeration of firms (industry) and the broad level is the country. I have 22 Countries, 18 Industries,17252 firms and 22 years.
For panel data clustering I usually cluster at a single unit level, that is firm-level. However, some articles cluster at both firm and year levels in the cross-country setup.
What does it mean by double clustering (firm and year)?
Clustering as far as I know in the context of the panel, is to account for the correlation within the units. For instance, if the residual of the outcome variable is likely to be correlated within say Industry, one should cluster the standard errors by industry. But in the context of double clustering with respect to firm-year, will it make sense to cluster SE within these unique pairs of firm and year?
Similarly in a post, I have seen that clustering units less than 30 is not advisable (https://www.statalist.org/forums/for...72#post1603472). Will this apply to double clustering, where my no: of years are <30.
Code:
. xtset id year Panel variable: id (unbalanced) Time variable: year, 1999 to 2020, but with gaps Delta: 1 unit . reghdfe dividends risk roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id ) (dropped 1846 singleton observations) (MWFE estimator converged in 8 iterations) HDFE Linear regression Number of obs = 92,159 Absorbing 2 HDFE groups F( 9, 10505) = 192.48 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.3821 Adj R-squared = 0.3024 Within R-sq. = 0.0373 Number of clusters (id) = 10,506 Root MSE = 0.1669 (Std. err. adjusted for 10,506 clusters in id) ------------------------------------------------------------------------------ | Robust dividends | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- risk | .0089675 .003735 2.40 0.016 .0016462 .0162888 roa_w | -.7158403 .0192201 -37.24 0.000 -.7535152 -.6781653 size_w | .0051734 .0023954 2.16 0.031 .000478 .0098688 lev_w | -.0614293 .0088244 -6.96 0.000 -.0787268 -.0441318 sg_w | -.0029462 .0003515 -8.38 0.000 -.0036352 -.0022572 cash_ta1_w | -.0693444 .010555 -6.57 0.000 -.0900342 -.0486545 tangib_w | -.0245404 .0092626 -2.65 0.008 -.0426969 -.006384 age | .0165564 .0036146 4.58 0.000 .0094712 .0236417 mb_w | -.0006307 .0001642 -3.84 0.000 -.0009526 -.0003089 _cons | .2522908 .0239573 10.53 0.000 .2053299 .2992517 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id | 10506 10506 0 *| year | 21 0 21 | -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation . reghdfe dividends risk roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id year ) (dropped 1846 singleton observations) (MWFE estimator converged in 8 iterations) Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied. HDFE Linear regression Number of obs = 92,159 Absorbing 2 HDFE groups F( 9, 20) = 94.49 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.3821 Adj R-squared = 0.3024 Number of clusters (id) = 10,506 Within R-sq. = 0.0373 Number of clusters (year) = 21 Root MSE = 0.1669 (Std. err. adjusted for 21 clusters in id year) ------------------------------------------------------------------------------ | Robust dividends | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- risk | .0089675 .0122294 0.73 0.472 -.0165426 .0344776 roa_w | -.7158403 .046722 -15.32 0.000 -.8133007 -.6183799 size_w | .0051734 .004472 1.16 0.261 -.004155 .0145018 lev_w | -.0614293 .01249 -4.92 0.000 -.087483 -.0353757 sg_w | -.0029462 .0006372 -4.62 0.000 -.0042754 -.001617 cash_ta1_w | -.0693444 .0108852 -6.37 0.000 -.0920505 -.0466382 tangib_w | -.0245404 .0096214 -2.55 0.019 -.0446104 -.0044705 age | .0165564 .0060575 2.73 0.013 .0039207 .0291922 mb_w | -.0006307 .0002081 -3.03 0.007 -.0010648 -.0001967 _cons | .2522908 .0807704 3.12 0.005 .0838068 .4207749 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id | 10506 10506 0 *| year | 21 21 0 *| -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation
Any thoughts, or suggestions could be helpful as this is for my general learning
0 Response to Double Clustering in a Multi-country Data Set up
Post a Comment