Dear Statalisters,

I am using interrupted time series methods on household panel data. I have monthly data on about 2000 households over a three-year period (18 months before the event, 18 months after). The panel is unbalanced, with a mean follow-up of 29 months. I construct my counterfactual (extrapolate the pre trends into the post period) using a linear FE model controlling for the time period (before/after the event), a linear time trend (with an interaction with the time period to allow for a change in the trend), month dummies and a set of additional covariates.

I did a model building exercise to better understand the difference between the purely descriptive pre vs. post mean difference and the adjusted mean difference (between fitted and counterfactual values in the post period).

Here is the issue. When I fit a random effects model with xtreg, re vce(cluster panelvar) only controlling for the time period and a linear time trend, the Hansen’s J test of overidentifying restriction for no correlation between the panel effects and the regressors (using xtoverid) fails to reject (Chi2(2) = 0.15, p = 0.9276), as expected. However, adding the interaction between the time period and the trend, the test now strongly rejects (Chi2(3) = 14.543, p = 0.0023). The coefficient estimates are slightly different than the FE estimates, but xtreg, fe vce(robust) yields corr(u_i, Xb) = 0.0053, essentially the same as when the interaction is excluded (0.0054). Also, the estimation sample is the same throughout.

How can the test reject when (1) the additional variable (the interaction) is conceptually unrelated to the panel effects, and (2) corr(u_i, Xb) is basically the same across the two FE models (with and without the interaction)?

Since my panel is unbalanced, I wondered if it could be due to attrition and replenishment of the sample. When I restrict the estimation sample to households with complete follow-up (about 60% of households), the Hansen’s J test rejects with the period indicator the only regressor (Chi(2) = 109.351), even though xtreg, fe vce(robust) yields virtually the same coefficient estimate and corr(u_i, Xb) = 0 (as expected).

Am I missing something obvious? Or is this telling me something about the data?

I am doing this exercise because household composition has a sizable impact on the fitted vs. counterfactual mean difference when it is included in the model alongside the post * t interaction, but not when the interaction is omitted, and I am not sure why.

Any idea on why I am getting these results or on what I might do next to find out would be greatly appreciated.

Maxime
Code:
. xtreg y post t, re vce(cluster id)
 
Random-effects GLS regression                   Number of obs     =     69,494
Group variable: id                              Number of groups  =      2,383
 
R-sq:                                           Obs per group:
     within  = 0.0078                                         min =          1
     between = 0.0006                                         avg =       29.2
     overall = 0.0046                                         max =         36
 
                                                Wald chi2(2)      =     174.10
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
 
                                 (Std. Err. adjusted for 2,383 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        post |  -32.61146   3.672993    -8.88   0.000    -39.81039   -25.41252
           t |  -.4647523   .2322854    -2.00   0.045    -.9200233   -.0094813
       _cons |   480.6522   6.420681    74.86   0.000     468.0679    493.2365
-------------+----------------------------------------------------------------
     sigma_u |  231.44268
     sigma_e |  228.77659
         rho |  .50579289   (fraction of variance due to u_i)
------------------------------------------------------------------------------
 
. xtoverid
 
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(id)
Sargan-Hansen statistic   0.150  Chi-sq(2)    P-value = 0.9276
 
. xtreg y post t, fe vce(cluster id)
 
Fixed-effects (within) regression               Number of obs     =     69,494
Group variable: id                              Number of groups  =      2,383
 
R-sq:                                           Obs per group:
     within  = 0.0078                                         min =          1
     between = 0.0006                                         avg =       29.2
     overall = 0.0046                                         max =         36
 
                                                F(2,2382)         =      86.54
corr(u_i, Xb)  = 0.0054                         Prob > F          =     0.0000
 
                                 (Std. Err. adjusted for 2,383 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        post |  -32.56479   3.672894    -8.87   0.000    -39.76719   -25.36239
           t |  -.4696601   .2338195    -2.01   0.045    -.9281709   -.0111493
       _cons |    489.016   3.468408   140.99   0.000     482.2146    495.8174
-------------+----------------------------------------------------------------
     sigma_u |  239.63657
     sigma_e |  228.77659
         rho |  .52317216   (fraction of variance due to u_i)
------------------------------------------------------------------------------
 
. xtreg y post t post_t, re vce(cluster id)
 
Random-effects GLS regression                   Number of obs     =     69,494
Group variable: id                              Number of groups  =      2,383
 
R-sq:                                           Obs per group:
     within  = 0.0078                                         min =          1
     between = 0.0004                                         avg =       29.2
     overall = 0.0046                                         max =         36
 
                                                Wald chi2(3)      =     174.64
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
 
                                 (Std. Err. adjusted for 2,383 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        post |  -44.64032   8.956723    -4.98   0.000    -62.19518   -27.08547
           t |  -.7876857   .3423413    -2.30   0.021    -1.458662    -.116709
      post_t |   .6493272   .4381844     1.48   0.138    -.2094984    1.508153
       _cons |   483.5995   6.881025    70.28   0.000     470.1129     497.086
-------------+----------------------------------------------------------------
     sigma_u |  230.67539
     sigma_e |  228.77046
         rho |  .50414608   (fraction of variance due to u_i)
------------------------------------------------------------------------------
 
. xtoverid
 
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(id)
Sargan-Hansen statistic  14.543  Chi-sq(3)    P-value = 0.0023
 
. xtreg y post t post_t, fe vce(cluster id)
 
Fixed-effects (within) regression               Number of obs     =     69,494
Group variable: id                              Number of groups  =      2,383
 
R-sq:                                           Obs per group:
     within  = 0.0078                                         min =          1
     between = 0.0003                                         avg =       29.2
     overall = 0.0046                                         max =         36
 
                                                F(3,2382)         =      57.80
corr(u_i, Xb)  = 0.0053                         Prob > F          =     0.0000
 
                                 (Std. Err. adjusted for 2,383 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        post |   -46.1116   8.987428    -5.13   0.000    -63.73559   -28.48761
           t |   -.832746   .3437206    -2.42   0.015    -1.506768   -.1587235
      post_t |   .7309716   .4394493     1.66   0.096     -.130771    1.592714
       _cons |   492.4736   4.386992   112.26   0.000     483.8709    501.0763
-------------+----------------------------------------------------------------
     sigma_u |  239.70379
     sigma_e |  228.77046
         rho |  .52332547   (fraction of variance due to u_i)
------------------------------------------------------------------------------
 
. xtreg y post if T_i == 36, re vce(cluster id)
 
Random-effects GLS regression                   Number of obs     =     49,572
Group variable: id                              Number of groups  =      1,377
 
R-sq:                                           Obs per group:
     within  = 0.0000                                         min =         36
     between = 0.0000                                         avg =       36.0
     overall = 0.0041                                         max =         36
 
                                                Wald chi2(1)      =     109.35
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
 
                                 (Std. Err. adjusted for 1,377 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        post |  -40.97813   3.918698   -10.46   0.000    -48.65864   -33.29763
       _cons |   498.4042   6.921887    72.00   0.000     484.8376    511.9709
-------------+----------------------------------------------------------------
     sigma_u |  225.27162
     sigma_e |  223.84015
         rho |  .50318729   (fraction of variance due to u_i)
------------------------------------------------------------------------------
 
. xtoverid
 
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(id)
Sargan-Hansen statistic 109.351  Chi-sq(1)    P-value = 0.0000