Dear All!

I have panel data (time: 2 periods; N: 2,093) and because the Breusch and Pagan Lagrange test (xttest0) suggested that random effects panel regression is not appropriate for my data, I ran a pooled regression including vce(cluster).
(Random effects panel regression is also not possible due to important time-invariant independent variables)

However, I would be interested in how the effect of one of my independent variables on the dependent variable changes over the two periods. Therefore, I was wondering if it is okay to include some kind of interaction term for time in the pooled regression?

For example, if this were the output of the pooled regression without any interaction terms:

Code:
. regress y compuls wage unempl fp smok empl avage hhsize commbl pop, vce(cluster period)
 
Linear regression                               Number of obs     =      4,186
                                                F(1, 1)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0800
                                                Root MSE          =     16.119
 
                                  (Std. Err. adjusted for 2 clusters in period)
------------------------------------------------------------------------------
             |               Robust
 y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     compuls |   .1365676   .2853285     0.48   0.716    -3.488874     3.76201
        wage |  -.0003182   .0000804    -3.96   0.158    -.0013402    .0007037
      unempl |   -.230494   .2679899    -0.86   0.548    -3.635628     3.17464
         fp |  -.1511964   .3506337    -0.43   0.741     -4.60642    4.304028
        smok |  -.1398034   .1981418    -0.71   0.609    -2.657433    2.377826
     empl |   .2689837   .0386515     6.96   0.091    -.2221296     .760097
       avage |  -.5293836   .5587406    -0.95   0.517    -7.628856    6.570089
      hhsize |   7.353506   3.554856     2.07   0.287    -37.81522    52.52223
      commbl |   .0267488   .0110217     2.43   0.249    -.1132952    .1667927
pop |   1.075149     .22653     4.75   0.132    -1.803187    3.953485
       _cons |   64.91979   21.10673     3.08   0.200    -203.2666    333.1062

I would like to do something like this to check how the effect of the variable compuls differs between period 1 and period 2:

Code:
regress y c.compuls##i.period wage unempl fp smok empl avage hhsize commbl pop, vce(cluster period)
 
Linear regression                               Number of obs     =      4,186
                                                F(0, 1)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.4739
                                                Root MSE          =     12.192
 
                                     (Std. Err. adjusted for 2 clusters in period)
---------------------------------------------------------------------------------
                |               Robust
     y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
        compuls |   .2313342   .1904941     1.21   0.439    -2.189122    2.651791
        2.period|   23.80873   7.70e-12  3.1e+12   0.000     23.80873    23.80873
                |
period#c.compuls |
             2  |  -.1895332   4.84e-13 -3.9e+11   0.000    -.1895332   -.1895332
                |
           wage |  -.0003182   .0000804    -3.96   0.158    -.0013404     .000704
         unempl |   -.230494   .2680541    -0.86   0.548    -3.636444    3.175456
            fp |  -.1511964   .3507178    -0.43   0.741    -4.607488    4.305095
           smok |  -.1398034   .1981892    -0.71   0.609    -2.658037     2.37843
        empl |   .2689837   .0386607     6.96   0.091    -.2222472    .7602147
          avage |  -.5293836   .5588745    -0.95   0.517    -7.630557     6.57179
         hhsize |   7.353506   3.555707     2.07   0.287    -37.82604    52.53305
         commbl |   .0267488   .0110243     2.43   0.249    -.1133288    .1668263
   pop |   1.075149   .2265842     4.75   0.132    -1.803877    3.954175
          _cons |   53.01542   33.03325     1.60   0.355    -366.7118    472.7427
---------------------------------------------------------------------------------

Would that be the right way to go? Or is there something I'm missing and you actually can't do something like this? (Also, the r2 increases significantly from model 1 to model 2, which makes me a little suspicious whether I have chosen the right approach).


Here is also some example of my data:

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input double y byte period double(pop compuls wage unempl smok fp avage empl commbl hhsize)
1238.39009287926 1  62.806552076139 11.18 38118 2.06 11.0663983903421 10.8938547486034 41.25 40.13 196.27 2.65
11764.7058823529 2  62.806552076139 11.18 38118 2.06 11.0663983903421 10.8938547486034 41.25 40.13 196.27 2.65
3939.26959376282 1 140.930096470981 10.82 44328 1.89 17.0396475770925 13.9903709997168 43.34 21.81  67.58 2.23
3829.84543838052 2 140.930096470981 10.82 44328 1.89 17.0396475770925 13.9903709997168 43.34 21.81  67.58 2.23
3348.83720930233 2 134.404286495789 14.72 42737 3.73 12.7123977344242 15.2527075812274 41.69 16.05  46.53 2.24

I would be really grateful if someone could help me out and give me some advice on this!

I use Stata 16.1.

Thanks!

Anja