First, a heartfelt advance New Year Wishes to All. I wish all a prosperous New Year
I am dealing with a cross-country dataset, in which the lowest units are firms. I have an agglomeration of firms (industry) and the broad level is the country. I have 22 Countries, 18 Industries,17252 firms and 22 years.
For panel data clustering I usually cluster at a single unit level, that is firm-level. However, some articles cluster at both firm and year levels in the cross-country setup.
What does it mean by double clustering (firm and year)?
Clustering as far as I know in the context of the panel, is to account for the correlation within the units. For instance, if the residual of the outcome variable is likely to be correlated within say Industry, one should cluster the standard errors by industry. But in the context of double clustering with respect to firm-year, will it make sense to cluster SE within these unique pairs of firm and year?
Similarly in a post, I have seen that clustering units less than 30 is not advisable (https://www.statalist.org/forums/for...72#post1603472). Will this apply to double clustering, where my no: of years are <30.
Code:
. xtset id year
Panel variable: id (unbalanced)
Time variable: year, 1999 to 2020, but with gaps
Delta: 1 unit
. reghdfe dividends risk roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id )
(dropped 1846 singleton observations)
(MWFE estimator converged in 8 iterations)
HDFE Linear regression Number of obs = 92,159
Absorbing 2 HDFE groups F( 9, 10505) = 192.48
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.3821
Adj R-squared = 0.3024
Within R-sq. = 0.0373
Number of clusters (id) = 10,506 Root MSE = 0.1669
(Std. err. adjusted for 10,506 clusters in id)
------------------------------------------------------------------------------
| Robust
dividends | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
risk | .0089675 .003735 2.40 0.016 .0016462 .0162888
roa_w | -.7158403 .0192201 -37.24 0.000 -.7535152 -.6781653
size_w | .0051734 .0023954 2.16 0.031 .000478 .0098688
lev_w | -.0614293 .0088244 -6.96 0.000 -.0787268 -.0441318
sg_w | -.0029462 .0003515 -8.38 0.000 -.0036352 -.0022572
cash_ta1_w | -.0693444 .010555 -6.57 0.000 -.0900342 -.0486545
tangib_w | -.0245404 .0092626 -2.65 0.008 -.0426969 -.006384
age | .0165564 .0036146 4.58 0.000 .0094712 .0236417
mb_w | -.0006307 .0001642 -3.84 0.000 -.0009526 -.0003089
_cons | .2522908 .0239573 10.53 0.000 .2053299 .2992517
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
id | 10506 10506 0 *|
year | 21 0 21 |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
. reghdfe dividends risk roa_w size_w lev_w sg_w cash_ta1_w tangib_w age mb_w, absorb(id year) cluster (id year )
(dropped 1846 singleton observations)
(MWFE estimator converged in 8 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
HDFE Linear regression Number of obs = 92,159
Absorbing 2 HDFE groups F( 9, 20) = 94.49
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.3821
Adj R-squared = 0.3024
Number of clusters (id) = 10,506 Within R-sq. = 0.0373
Number of clusters (year) = 21 Root MSE = 0.1669
(Std. err. adjusted for 21 clusters in id year)
------------------------------------------------------------------------------
| Robust
dividends | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
risk | .0089675 .0122294 0.73 0.472 -.0165426 .0344776
roa_w | -.7158403 .046722 -15.32 0.000 -.8133007 -.6183799
size_w | .0051734 .004472 1.16 0.261 -.004155 .0145018
lev_w | -.0614293 .01249 -4.92 0.000 -.087483 -.0353757
sg_w | -.0029462 .0006372 -4.62 0.000 -.0042754 -.001617
cash_ta1_w | -.0693444 .0108852 -6.37 0.000 -.0920505 -.0466382
tangib_w | -.0245404 .0096214 -2.55 0.019 -.0446104 -.0044705
age | .0165564 .0060575 2.73 0.013 .0039207 .0291922
mb_w | -.0006307 .0002081 -3.03 0.007 -.0010648 -.0001967
_cons | .2522908 .0807704 3.12 0.005 .0838068 .4207749
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
id | 10506 10506 0 *|
year | 21 21 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computationAny thoughts, or suggestions could be helpful as this is for my general learning
No comments:
Post a Comment