I have a panel data with n=48, T=40. The data contains people’s subjective experience of their daily social interactions (measured separately by how positive, negative, and meaningful the interactions are) and their daily depression score. Each participant completed 40 days of the study and there were a total of 48 participants.
I want to understand how the positivity, negativity, and meaningfulness of one’s social interactions affect their depression level. Since depression is roughly consistent from day-to-day, I want to include a lagged depression score as a dependent variable. From all the readings, it seems that lagged variable regression is the most appropriate model. So I ran a regression with xtabond2.
The command I ran was:
Code:
xtabond2 depression_final L.depression_final pos_si neg_si meaningful, gmm(depression_final, lag(2 3) collapse) gmm(pos_si neg_si meaningful, lag(1 1) collapse) twostep
Code:
. xtabond2 depression_final L.depression_final pos_si neg_si meaningful, gmm(depression_final, lag(2 3) colla > pse) gmm(pos_si neg_si meaningful, lag(1 1) collapse) twostep Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm. Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: pid_cat Number of obs = 1161 Time variable : dateval Number of groups = 48 Number of instruments = 10 Obs per group: min = 3 Wald chi2(4) = 39.39 avg = 24.19 Prob > chi2 = 0.000 max = 41 ---------------------------------------------------------------------------------- depression_final | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- depression_final | L1. | .1415712 .0343344 4.12 0.000 .074277 .2088654 | pos_si | -.122131 .0306348 -3.99 0.000 -.1821742 -.0620879 neg_si | -.0937629 .032896 -2.85 0.004 -.1582378 -.0292879 meaningful | .1380227 .0533276 2.59 0.010 .0335025 .2425428 _cons | 2.86786 .3796965 7.55 0.000 2.123668 3.612051 ---------------------------------------------------------------------------------- Warning: Uncorrected two-step standard errors are unreliable. Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L(2/3).depression_final collapsed L.(pos_si neg_si meaningful) collapsed Instruments for levels equation Standard _cons GMM-type (missing=0, separate instruments for each period unless collapsed) DL.depression_final collapsed D.(pos_si neg_si meaningful) collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -3.51 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = -1.47 Pr > z = 0.142 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(5) = 10.11 Prob > chi2 = 0.072 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(5) = 10.81 Prob > chi2 = 0.055 (Robust, but can be weakened by many instruments.) Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(1) = 0.97 Prob > chi2 = 0.326 Difference (null H = exogenous): chi2(4) = 9.84 Prob > chi2 = 0.043 gmm(depression_final, collapse lag(2 3)) Hansen test excluding group: chi2(2) = 3.69 Prob > chi2 = 0.158 Difference (null H = exogenous): chi2(3) = 7.12 Prob > chi2 = 0.068
1) Do I need the lagged independent term: gmm(pos_si neg_si meaningful, lag(1 1) collapse)? It seems like a good start to include the first lagged independent variables (referring to Sebastian’s response here). But I don’t quite understand why the independent term is needed.
2) Also, for gmm(pos_si neg_si meaningful, lag(1 1) collapse) part, I didn't use iv() for this because the help says iv is for exogenous variables. Since I have priori hypothesis that these 3 variables influence depression score, I don't think it's appropriate to use iv command. Is that correct?
2) Is lag(2 3) the appropriate specification for the lagged dependent variable? I used lag 2 - 3 since my T(40) is not large, compared to n(48). So I can’t afford to use all the lags (i.e., gmm(depression_final, lag(2 .))). However, the coefficient changes if I include all lags for the dependent variable (see result below). Especially for the L1.depression term, the coefficient is larger, more aligned with existing literature. So should I include all lagged dependent variables even though the number of instrument is higher than my N?
3) Why is the L1.depression coefficient so low even though depression and L1.depression are highly correlated (~0.80 using pwcorr)? While I understand that Pearson's correlation and system GMM are completely different tests, I can’t understand on a high level why the high correlation of the lagged dependent variable is not present in xtabond2 result.
I’ve read the following resources on xtabond2:
- Stata help on xtabond2
- Roodman's Stata journal chapter on how to do xtabond,
- Estimating dynamic panel data
I sincerely apologize for the long post. Thank you very much for your time and for your help!
Siyan
Code:
. xtabond2 depression_final L.depression_final pos_si neg_si meaningful, gmm(L.depression_final, collapse) g > mm(pos_si neg_si meaningful, lag(1 1) collapse) twostep Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm. Warning: Number of instruments may be large relative to number of observations. Warning: Two-step estimated covariance matrix of moments is singular. Using a generalized inverse to calculate optimal weighting matrix for two-step estimation. Difference-in-Sargan statistics may be negative. Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: pid_cat Number of obs = 1161 Time variable : dateval Number of groups = 48 Number of instruments = 49 Obs per group: min = 3 Wald chi2(4) = 37100.72 avg = 24.19 Prob > chi2 = 0.000 max = 41 ---------------------------------------------------------------------------------- depression_final | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------+---------------------------------------------------------------- depression_final | L1. | .1674352 .002132 78.53 0.000 .1632565 .1716139 | pos_si | -.0777748 .0031215 -24.92 0.000 -.0838929 -.0716567 neg_si | -.0721509 .0006684 -107.94 0.000 -.073461 -.0708407 meaningful | .072521 .0045241 16.03 0.000 .0636539 .0813882 _cons | 2.530163 .0422862 59.83 0.000 2.447284 2.613043 ---------------------------------------------------------------------------------- Warning: Uncorrected two-step standard errors are unreliable. Instruments for first differences equation GMM-type (missing=0, separate instruments for each period unless collapsed) L(1/.).L.depression_final collapsed L.(pos_si neg_si meaningful) collapsed Instruments for levels equation Standard _cons GMM-type (missing=0, separate instruments for each period unless collapsed) D.L.depression_final collapsed D.(pos_si neg_si meaningful) collapsed ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -3.45 Pr > z = 0.001 Arellano-Bond test for AR(2) in first differences: z = -1.06 Pr > z = 0.290 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(44) = 74.52 Prob > chi2 = 0.003 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(44) = 43.79 Prob > chi2 = 0.480 (Robust, but can be weakened by many instruments.)
0 Response to xtabond2: question on determining appropriate lags
Post a Comment