Hi everyone,

I am using xtabond2 to run difference and system GMM regressions. I'm working in Stata version 15.1, and I'm using, as far as I can tell, the most recent version of xtabond2 (3.6.3 30 September 2015).

I was hoping someone would be able to help me understand why results change when I run a system GMM regression and drop the first 2 periods in the data. In the example below I use twice lagged levels as instruments for the differenced equations, and once lagged differences as instruments for the equations in levels. Since constructing these instruments requires observations in t-2, I assumed that excluding the first two periods would not affect results. However, the example below, based on Arellano and Bond’s data, seems to suggest otherwise. This is an annual dataset of firms for the period 1976-1984. As you can see, excluding years 1976 and 1977 changes results.

Code:
. clear all
. webuse abdata
. xtabond2 n L.n w k, gmm(n w k, laglimits(2 2) collapse) twostep robust small svmat
Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =       891
Time variable : year                            Number of groups   =       140
Number of instruments = 7                       Obs per group: min =         6
F(3, 139)     =    196.28                                      avg =      6.36
Prob > F      =     0.000                                      max =         8
------------------------------------------------------------------------------
             |              Corrected
           n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           n |
         L1. |   .6422413   .2478509     2.59   0.011     .1521961    1.132286
             |
           w |  -1.031833    .402151    -2.57   0.011    -1.826957   -.2367086
           k |   .2615404   .0976378     2.68   0.008     .0684931    .4545878
       _cons |   3.701805   1.545091     2.40   0.018     .6468852    6.756725
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L2.(n w k) collapsed
Instruments for levels equation
  Standard
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.(n w k) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -2.66  Pr > z =  0.008
Arellano-Bond test for AR(2) in first differences: z =  -1.35  Pr > z =  0.176
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(3)    =   6.86  Prob > chi2 =  0.077
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(3)    =   2.40  Prob > chi2 =  0.494
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(0)    =   0.00  Prob > chi2 =      .
    Difference (null H = exogenous): chi2(3)    =   2.40  Prob > chi2 =  0.494

* MODEL WITH FIRST TWO PERIODS EXCLUDED
. xtabond2 n L.n w k if year > 1977, gmm(n w k, laglimits(2 2) collapse) twostep robust small svmat
Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =       811
Time variable : year                            Number of groups   =       140
Number of instruments = 7                       Obs per group: min =         5
F(3, 139)     =    214.70                                      avg =      5.79
Prob > F      =     0.000                                      max =         7
------------------------------------------------------------------------------
             |              Corrected
           n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           n |
         L1. |   .6765348   .2384935     2.84   0.005     .2049907    1.148079
             |
           w |  -.9759851   .3861351    -2.53   0.013    -1.739443   -.2125274
           k |   .2468775   .0970336     2.54   0.012     .0550248    .4387303
       _cons |   3.476404   1.482807     2.34   0.020     .5446311    6.408178
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L2.(n w k) collapsed
Instruments for levels equation
  Standard
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.(n w k) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -2.73  Pr > z =  0.006
Arellano-Bond test for AR(2) in first differences: z =  -1.40  Pr > z =  0.161
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(3)    =   8.12  Prob > chi2 =  0.044
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(3)    =   2.95  Prob > chi2 =  0.399
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(0)    =   0.00  Prob > chi2 =      .
    Difference (null H = exogenous): chi2(3)    =   2.95  Prob > chi2 =  0.399
I looked at the instrument matrix e(Z) in both cases. When the first two years are not dropped, the constant shows up as an instrument for the equation in levels in 1977 (_cons has a value of 1) while all the other instruments are "zeroed out". When the first two years are dropped, _cons takes on a value of 0 for the equation in levels in 1977, as do the other instruments. I paste an extract of the instrument matrices below, for the firm with id = 5 (as this is a firm with data going back to 1976). I highlight the discrepancy in red.

First estimation:
Code:
                          Diff eq:    Diff eq:    Diff eq:  Levels eq:  Levels eq:  Levels eq:
                               L2.         L2.         L2.         LD.         LD.         LD.
                _cons           n           w           k           n           w           k
  5, 1976           0           0           0           0           0           0           0
  5, 1977           0           0           0           0           0           0           0
  5, 1978           0   4.4621887   3.0268579   3.1081855           0           0           0
  5, 1979           0   4.4670568    2.905709   3.1032538           0           0           0
  5, 1980           0   4.4659081   2.8979485   3.2255337           0           0           0
  5, 1981           0   4.5042443   2.9008501   3.2328379           0           0           0
  5, 1982           0    4.490881   2.9595506   3.3407183           0           0           0
  5, 1983           0           0           0           0           0           0           0
  5, 1984           0           0           0           0           0           0           0
  5, 1976           0           0           0           0           0           0           0
  5, 1977           1           0           0           0           0           0           0
  5, 1978           1           0           0           0   .00486803  -.12114882  -.00493169
  5, 1979           1           0           0           0   -.0011487  -.00776052   .12227988
  5, 1980           1           0           0           0   .03833628   .00290155   .00730419
  5, 1981           1           0           0           0  -.01336336   .05870056   .10788035
  5, 1982           1           0           0           0  -.07566118   .02735281  -.09050274
  5, 1983           0           0           0           0           0           0           0
  5, 1984           0           0           0           0           0           0           0
Now with the first two periods excluded (second estimation):
Code:
                          Diff eq:    Diff eq:    Diff eq:  Levels eq:  Levels eq:  Levels eq:
                               L2.         L2.         L2.         LD.         LD.         LD.
                _cons           n           w           k           n           w           k
  5, 1976           0           0           0           0           0           0           0
  5, 1977           0           0           0           0           0           0           0
  5, 1978           0   4.4621887   3.0268579   3.1081855           0           0           0
  5, 1979           0   4.4670568    2.905709   3.1032538           0           0           0
  5, 1980           0   4.4659081   2.8979485   3.2255337           0           0           0
  5, 1981           0   4.5042443   2.9008501   3.2328379           0           0           0
  5, 1982           0    4.490881   2.9595506   3.3407183           0           0           0
  5, 1983           0           0           0           0           0           0           0
  5, 1984           0           0           0           0           0           0           0
  5, 1976           0           0           0           0           0           0           0
  5, 1977           0           0           0           0           0           0           0
  5, 1978           1           0           0           0   .00486803  -.12114882  -.00493169
  5, 1979           1           0           0           0   -.0011487  -.00776052   .12227988
  5, 1980           1           0           0           0   .03833628   .00290155   .00730419
  5, 1981           1           0           0           0  -.01336336   .05870056   .10788035
  5, 1982           1           0           0           0  -.07566118   .02735281  -.09050274
  5, 1983           0           0           0           0           0           0           0
  5, 1984           0           0           0           0           0           0           0
How should I interpret what is happening? I realise that GMM "zeroes out" missing values for the instruments but I thought this left the moment conditions, and hence results, unaffected? Or does this only leave results unchanged asymptotically? I was not expecting the zeroeing out to bring an extra year of data into play; it seems as if the equation in levels for 1977 is used as part of the estimation, using only the constant as an instrument? Here, the effect on results is small, but I have other examples where the changes in results are more substantial (e.g. a static version of the above model).

For difference GMM excluding the first two years makes no difference to results. Of course, when I use xtabond2 to estimate only the equation in levels then again results change when I drop the first two periods.

Perhaps this is also relevant: using xtabond2 to estimate a static model with system GMM on only the first two years in the dataset returns some results:
Code:
. xtabond2 n w k if year < 1978, gmm(w k, laglimits(2 2) collapse) twostep robust small svmat
Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =       218
Time variable : year                            Number of groups   =       138
Number of instruments = 1                       Obs per group: min =         1
F(2, 137)     =     47.72                                      avg =      1.58
Prob > F      =     0.000                                      max =         2
------------------------------------------------------------------------------
             |              Corrected
           n |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           w |     .37529   .0384159     9.77   0.000     .2993251    .4512549
           k |          0  (omitted)
       _cons |          0  (omitted)
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L2.(w k) collapsed
Instruments for levels equation
  Standard
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.(w k) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =      .  Pr > z =      .
Arellano-Bond test for AR(2) in first differences: z =      .  Pr > z =      .
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(-2)   =   0.00  Prob > chi2 =      .
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(-2)   =   0.00  Prob > chi2 =      .
  (Robust, but weakened by many instruments.)
Here, the only available instrument is the constant for the equations in levels in 1976 and 1977 -- see this extract from the instrument matrix e(Z) (again for the firm with id = 5):
Code:
                          Diff eq:    Diff eq:  Levels eq:  Levels eq:
                               L2.         L2.         LD.         LD.
                _cons           w           k           w           k
  5, 1976           0           0           0           0           0
  5, 1977           0           0           0           0           0
  5, 1978           0           0           0           0           0
  5, 1979           0           0           0           0           0
  5, 1980           0           0           0           0           0
  5, 1981           0           0           0           0           0
  5, 1982           0           0           0           0           0
  5, 1983           0           0           0           0           0
  5, 1984           0           0           0           0           0
  5, 1976           1           0           0           0           0
  5, 1977           1           0           0           0           0
  5, 1978           0           0           0           0           0
  5, 1979           0           0           0           0           0
  5, 1980           0           0           0           0           0
  5, 1981           0           0           0           0           0
  5, 1982           0           0           0           0           0
  5, 1983           0           0           0           0           0
  5, 1984           0           0           0           0           0
I would be grateful for any comments/answers that can help me understand this issue better.

Best,

Nicolas