Hi everyone,

I have a panel data with n=48, T=40. The data contains people’s subjective experience of their daily social interactions (measured separately by how positive, negative, and meaningful the interactions are) and their daily depression score. Each participant completed 40 days of the study and there were a total of 48 participants.

I want to understand how the positivity, negativity, and meaningfulness of one’s social interactions affect their depression level. Since depression is roughly consistent from day-to-day, I want to include a lagged depression score as a dependent variable. From all the readings, it seems that lagged variable regression is the most appropriate model. So I ran a regression with xtabond2.

The command I ran was:
Code:
xtabond2 depression_final L.depression_final pos_si neg_si meaningful, gmm(depression_final, lag(2 3) collapse) gmm(pos_si neg_si meaningful, lag(1 1) collapse) twostep
The results are below:

Code:
. xtabond2 depression_final L.depression_final pos_si neg_si meaningful, gmm(depression_final, lag(2 3) colla
> pse) gmm(pos_si neg_si meaningful, lag(1 1) collapse) twostep
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: pid_cat                         Number of obs      =      1161
Time variable : dateval                         Number of groups   =        48
Number of instruments = 10                      Obs per group: min =         3
Wald chi2(4)  =     39.39                                      avg =     24.19
Prob > chi2   =     0.000                                      max =        41
----------------------------------------------------------------------------------
depression_final |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
depression_final |
             L1. |   .1415712   .0343344     4.12   0.000      .074277    .2088654
                 |
          pos_si |   -.122131   .0306348    -3.99   0.000    -.1821742   -.0620879
          neg_si |  -.0937629    .032896    -2.85   0.004    -.1582378   -.0292879
      meaningful |   .1380227   .0533276     2.59   0.010     .0335025    .2425428
           _cons |    2.86786   .3796965     7.55   0.000     2.123668    3.612051
----------------------------------------------------------------------------------
Warning: Uncorrected two-step standard errors are unreliable.

Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/3).depression_final collapsed
    L.(pos_si neg_si meaningful) collapsed
Instruments for levels equation
  Standard
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.depression_final collapsed
    D.(pos_si neg_si meaningful) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -3.51  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -1.47  Pr > z =  0.142
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(5)    =  10.11  Prob > chi2 =  0.072
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(5)    =  10.81  Prob > chi2 =  0.055
  (Robust, but can be weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(1)    =   0.97  Prob > chi2 =  0.326
    Difference (null H = exogenous): chi2(4)    =   9.84  Prob > chi2 =  0.043
  gmm(depression_final, collapse lag(2 3))
    Hansen test excluding group:     chi2(2)    =   3.69  Prob > chi2 =  0.158
    Difference (null H = exogenous): chi2(3)    =   7.12  Prob > chi2 =  0.068
Given the analysis and the results, my questions are:
1) Do I need the lagged independent term: gmm(pos_si neg_si meaningful, lag(1 1) collapse)? It seems like a good start to include the first lagged independent variables (referring to Sebastian’s response here). But I don’t quite understand why the independent term is needed.

2) Also, for gmm(pos_si neg_si meaningful, lag(1 1) collapse) part, I didn't use iv() for this because the help says iv is for exogenous variables. Since I have priori hypothesis that these 3 variables influence depression score, I don't think it's appropriate to use iv command. Is that correct?

2) Is lag(2 3) the appropriate specification for the lagged dependent variable? I used lag 2 - 3 since my T(40) is not large, compared to n(48). So I can’t afford to use all the lags (i.e., gmm(depression_final, lag(2 .))). However, the coefficient changes if I include all lags for the dependent variable (see result below). Especially for the L1.depression term, the coefficient is larger, more aligned with existing literature. So should I include all lagged dependent variables even though the number of instrument is higher than my N?

3) Why is the L1.depression coefficient so low even though depression and L1.depression are highly correlated (~0.80 using pwcorr)? While I understand that Pearson's correlation and system GMM are completely different tests, I can’t understand on a high level why the high correlation of the lagged dependent variable is not present in xtabond2 result.


I’ve read the following resources on xtabond2:
I sincerely apologize for the long post. Thank you very much for your time and for your help!

Siyan


Code:
. xtabond2 depression_final L.depression_final pos_si neg_si meaningful, gmm(L.depression_final,  collapse) g
> mm(pos_si neg_si meaningful, lag(1 1) collapse) twostep
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Warning: Number of instruments may be large relative to number of observations.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: pid_cat                         Number of obs      =      1161
Time variable : dateval                         Number of groups   =        48
Number of instruments = 49                      Obs per group: min =         3
Wald chi2(4)  =  37100.72                                      avg =     24.19
Prob > chi2   =     0.000                                      max =        41
----------------------------------------------------------------------------------
depression_final |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
depression_final |
             L1. |   .1674352    .002132    78.53   0.000     .1632565    .1716139
                 |
          pos_si |  -.0777748   .0031215   -24.92   0.000    -.0838929   -.0716567
          neg_si |  -.0721509   .0006684  -107.94   0.000     -.073461   -.0708407
      meaningful |    .072521   .0045241    16.03   0.000     .0636539    .0813882
           _cons |   2.530163   .0422862    59.83   0.000     2.447284    2.613043
----------------------------------------------------------------------------------
Warning: Uncorrected two-step standard errors are unreliable.

Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(1/.).L.depression_final collapsed
    L.(pos_si neg_si meaningful) collapsed
Instruments for levels equation
  Standard
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    D.L.depression_final collapsed
    D.(pos_si neg_si meaningful) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -3.45  Pr > z =  0.001
Arellano-Bond test for AR(2) in first differences: z =  -1.06  Pr > z =  0.290
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(44)   =  74.52  Prob > chi2 =  0.003
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(44)   =  43.79  Prob > chi2 =  0.480
  (Robust, but can be weakened by many instruments.)