I have panel data (time: 2 periods; N: 2,093) and because the Breusch and Pagan Lagrange test (xttest0) suggested that random effects panel regression is not appropriate for my data, I ran a pooled regression including vce(cluster).
(Random effects panel regression is also not possible due to important time-invariant independent variables)
However, I would be interested in how the effect of one of my independent variables on the dependent variable changes over the two periods. Therefore, I was wondering if it is okay to include some kind of interaction term for time in the pooled regression?
For example, if this were the output of the pooled regression without any interaction terms:
Code:
. regress y compuls wage unempl fp smok empl avage hhsize commbl pop, vce(cluster period) Linear regression Number of obs = 4,186 F(1, 1) = . Prob > F = . R-squared = 0.0800 Root MSE = 16.119 (Std. Err. adjusted for 2 clusters in period) ------------------------------------------------------------------------------ | Robust y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- compuls | .1365676 .2853285 0.48 0.716 -3.488874 3.76201 wage | -.0003182 .0000804 -3.96 0.158 -.0013402 .0007037 unempl | -.230494 .2679899 -0.86 0.548 -3.635628 3.17464 fp | -.1511964 .3506337 -0.43 0.741 -4.60642 4.304028 smok | -.1398034 .1981418 -0.71 0.609 -2.657433 2.377826 empl | .2689837 .0386515 6.96 0.091 -.2221296 .760097 avage | -.5293836 .5587406 -0.95 0.517 -7.628856 6.570089 hhsize | 7.353506 3.554856 2.07 0.287 -37.81522 52.52223 commbl | .0267488 .0110217 2.43 0.249 -.1132952 .1667927 pop | 1.075149 .22653 4.75 0.132 -1.803187 3.953485 _cons | 64.91979 21.10673 3.08 0.200 -203.2666 333.1062
I would like to do something like this to check how the effect of the variable compuls differs between period 1 and period 2:
Code:
regress y c.compuls##i.period wage unempl fp smok empl avage hhsize commbl pop, vce(cluster period) Linear regression Number of obs = 4,186 F(0, 1) = . Prob > F = . R-squared = 0.4739 Root MSE = 12.192 (Std. Err. adjusted for 2 clusters in period) --------------------------------------------------------------------------------- | Robust y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- compuls | .2313342 .1904941 1.21 0.439 -2.189122 2.651791 2.period| 23.80873 7.70e-12 3.1e+12 0.000 23.80873 23.80873 | period#c.compuls | 2 | -.1895332 4.84e-13 -3.9e+11 0.000 -.1895332 -.1895332 | wage | -.0003182 .0000804 -3.96 0.158 -.0013404 .000704 unempl | -.230494 .2680541 -0.86 0.548 -3.636444 3.175456 fp | -.1511964 .3507178 -0.43 0.741 -4.607488 4.305095 smok | -.1398034 .1981892 -0.71 0.609 -2.658037 2.37843 empl | .2689837 .0386607 6.96 0.091 -.2222472 .7602147 avage | -.5293836 .5588745 -0.95 0.517 -7.630557 6.57179 hhsize | 7.353506 3.555707 2.07 0.287 -37.82604 52.53305 commbl | .0267488 .0110243 2.43 0.249 -.1133288 .1668263 pop | 1.075149 .2265842 4.75 0.132 -1.803877 3.954175 _cons | 53.01542 33.03325 1.60 0.355 -366.7118 472.7427 ---------------------------------------------------------------------------------
Would that be the right way to go? Or is there something I'm missing and you actually can't do something like this? (Also, the r2 increases significantly from model 1 to model 2, which makes me a little suspicious whether I have chosen the right approach).
Here is also some example of my data:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double y byte period double(pop compuls wage unempl smok fp avage empl commbl hhsize) 1238.39009287926 1 62.806552076139 11.18 38118 2.06 11.0663983903421 10.8938547486034 41.25 40.13 196.27 2.65 11764.7058823529 2 62.806552076139 11.18 38118 2.06 11.0663983903421 10.8938547486034 41.25 40.13 196.27 2.65 3939.26959376282 1 140.930096470981 10.82 44328 1.89 17.0396475770925 13.9903709997168 43.34 21.81 67.58 2.23 3829.84543838052 2 140.930096470981 10.82 44328 1.89 17.0396475770925 13.9903709997168 43.34 21.81 67.58 2.23 3348.83720930233 2 134.404286495789 14.72 42737 3.73 12.7123977344242 15.2527075812274 41.69 16.05 46.53 2.24
I would be really grateful if someone could help me out and give me some advice on this!
I use Stata 16.1.
Thanks!
Anja
0 Response to Pooled regression with interaction term for time?
Post a Comment