Hello everyone,

I am trying to work out how to complete a DiD (difference in difference) analysis with Difference GMM and System GMM using Panel Data. Because my T is small and I am working with a cluster structure, I am using xtabond2 for the estimates.

My data set contains 8,232 students in a Panel Data format with T=5. For each student, I have the test scores (depvar) and a list of observed variables over the time period (indepvar). Morover, I have time, school and class fixed effects.

During the time series (2003-2008), a policy change is implemented in state schools in year 2007. Then, students from state schools are my treatment group and students from municipal schools are the control group. My DiD is 1 if student is enrolled in state schools (treated) in post-treatment period (time).

In the model I assume L1.profic_mat as endogenous, the control variables as predetermined and the fixed effects and DiD as exogenous. Then for the system GMM I estimate the following model: (PS: Coefficients for control variables and fixed effects are not show to save space).
Code:
xi: xtabond2 L(0/1).profic_mat DiD time treated $controlvar i.wave i.IDescola i.IDturma, ///
gmm(L1.profic_mat,lag(1 1)) ///
gmmstyle($controlvar) ///
iv(DiD time treated i.wave i.IDescola i.IDturma, equation(level)) ///
cluster(IDescola) twostep small orthogonal
i.wave            _Iwave_1-5          (naturally coded; _Iwave_1 omitted)
i.IDescola        _IIDescola_35018348-35924957(naturally coded; _IIDescola_35018348 omitted)
i.IDturma         _IIDturma_269-3809  (naturally coded; _IIDturma_269 omitted)
Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: IDaluno                         Number of obs      =      4056
Time variable : wave                            Number of groups   =      1713
Number of instruments = 755                     Obs per group: min =         1
F(644, 31)    =  49235.39                                      avg =      2.37
Prob > F      =     0.000                                      max =         4
                                      (Std. Err. adjusted for clustering on IDescola)
-------------------------------------------------------------------------------------
                    |              Corrected
         profic_mat |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
         profic_mat |
                L1. |   .3678468   .0810637     4.54   0.000     .2025164    .5331773
                    |
                DiD |   160.1348   57.99518     2.76   0.010     41.85286    278.4168
               time |          0  (omitted)
            treated |  -82.91613    87.8292    -0.94   0.352     -262.045     96.2127

For the difference GMM, I have:

Code:
xi: xtabond2 L(0/1).profic_mat DiD time treated $controlvar i.wave i.IDescola i.IDturma, ///
gmm(L1.profic_mat,lag(1 1)) ///
gmmstyle($controlvar) ///
iv(DiD time treated i.wave i.IDescola i.IDturma, equation(level)) ///
cluster(IDescola) twostep small orthogonal noleveleq

However, I am still unsure whether this specification is right, because the values of the DiD coefficients for System and Difference GMM are very different between themselves. When I estimate the model with FE (with no lagged variable) the result is also very different.

For this reason the question: Is my specification of DiD in this GMM right?
I am not sure, whether the DiD in this case will work exactly in the same way as in a linear model. I need help with the implentation of DiD in this GMM and with the interpretation of its coefficient.

I am thankful for all help and Information.