I have panel data covering 763 firms over 15 years, taken from an industry consortium. I want to estimate how changes in the memberships across competing industry consortia, the number of simultaneous affiliations, the role within the focal consortium and the provision of a platform product (time-invariant) affect their product certifications. So the basic model would look like this:
productcerts_t = beta0 + beta1 * changemem_t-1 + beta2 * simulmem_t-1 + beta3 * role_t-1 + beta4 * platform + controls
While the model is rather straight forward, I am currently facing the issue that firms, in order to be able to certify products, are required to be members. Thus, I included a dummy variable member_t and its interaction terms with all other variables, except for role as it already requires member_t to be 1. However, that causes multicollinearity in a more complete model with all control variables and produces a large result set due to the interactions. The model then looks like this:
productcerts_t = beta0 + beta1 * changemem_t + beta2 * simulmem_t + beta3 * role_t + beta4 * platform + beta5 * member_t + beta6 * member_t * changemem_t + beta7 * member_t * simulmem_t + beta8 * member_t * platform + controls
I was wondering if there is a more elegant way that yields consistent results. Intuitively, I thought about filtering the observations, excluding all records where member_t == 0 and ran a pooled OLS with time dummies and clustered standard errors on id. But I am not sure if that is an appropriate approach.
Here are some results I computed:
1) pooled OLS with interactions and clustered standard errors
Code:
. reg productcerts i.member##c.L1.changemem i.member##c.L1.simulmem L1.role i.member##i.platform i.year, cluster(id) Linear regression Number of obs = 10,682 F(21, 762) = 5.34 Prob > F = 0.0000 R-squared = 0.0789 Root MSE = 2.1651 (Std. Err. adjusted for 763 clusters in id) --------------------------------------------------------------------------------------- | Robust productcerts | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------------+---------------------------------------------------------------- 1.member | .4415453 .0704454 6.27 0.000 .3032552 .5798353 | changemem | L1. | .5532723 .5693309 0.97 0.331 -.5643709 1.670915 | member#cL.changemem | 1 | -1.356654 .5850375 -2.32 0.021 -2.505131 -.2081776 | simulmem | L1. | .2238948 .1329231 1.68 0.093 -.0370441 .4848337 | member#cL.simulmem | 1 | -.125708 .2387898 -0.53 0.599 -.5944721 .343056 | role | L1. | 3.496489 1.625868 2.15 0.032 .3047766 6.688201 | 1.platform | .1912549 .0916024 2.09 0.037 .011432 .3710779 | member#platform | 1 1 | 1.72839 .6695502 2.58 0.010 .4140079 3.042772 | year | 2007 | .0248361 .064558 0.38 0.701 -.1018965 .1515687 2008 | -.0068504 .0425496 -0.16 0.872 -.0903788 .076678 2009 | .0131293 .0814032 0.16 0.872 -.1466718 .1729304 2010 | -.0718353 .0605614 -1.19 0.236 -.1907222 .0470516 2011 | .0194899 .0722105 0.27 0.787 -.1222652 .161245 2012 | -.0246552 .063305 -0.39 0.697 -.1489281 .0996177 2013 | .0549405 .0778698 0.71 0.481 -.0979243 .2078054 2014 | -.0239332 .068818 -0.35 0.728 -.1590286 .1111623 2015 | .1155241 .1268944 0.91 0.363 -.13358 .3646283 2016 | .1556162 .0833659 1.87 0.062 -.008038 .3192703 2017 | .2129104 .1003894 2.12 0.034 .0158378 .409983 2018 | .0882369 .0852473 1.04 0.301 -.0791104 .2555843 2019 | .2275257 .1756277 1.30 0.196 -.1172458 .5722973 | _cons | -.0412679 .0527809 -0.78 0.435 -.1448811 .0623454 ---------------------------------------------------------------------------------------
2) pooled OLS model with filtered observations, excluding records where member_t == 0
Code:
. reg productcerts c.L1.changemem c.L1.simulmem L1.role i.platform i.year if member, cluster(id) Linear regression Number of obs = 3,189 F(17, 762) = 4.06 Prob > F = 0.0000 R-squared = 0.0612 Root MSE = 3.7748 (Std. Err. adjusted for 763 clusters in id) ------------------------------------------------------------------------------- | Robust productcerts | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- changemem | L1. | -.8704477 .463063 -1.88 0.061 -1.779478 .038583 | simulmem | L1. | .0518086 .2144131 0.24 0.809 -.3691018 .4727191 | role | L1. | 3.817154 1.762396 2.17 0.031 .3574265 7.276881 | 1.platform | 1.894842 .6702957 2.83 0.005 .5789965 3.210687 | year | 2007 | .0040985 .5051178 0.01 0.994 -.9874892 .9956862 2008 | -.1097982 .3155404 -0.35 0.728 -.72923 .5096335 2009 | -.1420047 .441294 -0.32 0.748 -1.008301 .7242916 2010 | -.3944345 .415614 -0.95 0.343 -1.210319 .42145 2011 | .057463 .4554769 0.13 0.900 -.8366756 .9516016 2012 | -.2025696 .4350108 -0.47 0.642 -1.056531 .6513922 2013 | .1460105 .4491477 0.33 0.745 -.7357033 1.027724 2014 | -.0861011 .4196607 -0.21 0.837 -.9099294 .7377273 2015 | .3009069 .4922737 0.61 0.541 -.6654667 1.267281 2016 | .3548497 .4243433 0.84 0.403 -.4781711 1.18787 2017 | .3556781 .4230937 0.84 0.401 -.4748896 1.186246 2018 | .1858268 .4198763 0.44 0.658 -.6384248 1.010078 2019 | .5825402 .5546162 1.05 0.294 -.5062169 1.671297 | _cons | .3286396 .4078965 0.81 0.421 -.4720947 1.129374 -------------------------------------------------------------------------------
The second model shows slight changes in the coefficient estimates.
Code:
. quietly: xtreg productcerts i.member##c.L1.changemem i.member##c.L1.simulmem L1.role i.member##i.platform i.year . xttest0 Breusch and Pagan Lagrangian multiplier test for random effects productcerts[id,t] = Xb + u[id] + e[id,t] Estimated results: | Var sd = sqrt(Var) ---------+----------------------------- product~s | 5.079473 2.253769 e | 4.0121 2.003023 u | .6070492 .7791336 Test: Var(u) = 0 chibar2(01) = 1353.13 Prob > chibar2 = 0.0000
Further, the Breusch-Pagan ML test favors a model with random effects, Hausman Test and suest cannot be run on the data/models.
I would appreciate, If you could give me some suggestion how to succeed with this problematic. Would you recommend to stick with an RE/FE model and use the interactions? Is it legit under some assumptions to filter observations for pooled OLS? Or is there any other approach?
Best,
Sven
0 Response to Pooled OLS with interaction on almost all explanatory variables
Post a Comment