Hi all,

I am writing about the correct usage of xtabond2, performing System GMM in STATA. My data are a panel like the following:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float idatc3 str4 atc3no float Year
 1 "A10D" 2004
 1 "A10D" 2005
 1 "A10D" 2006
 1 "A10D" 2007
 1 "A10D" 2008
 1 "A10D" 2009
 1 "A10D" 2010
 1 "A10D" 2011
 1 "A10D" 2012
 1 "A10D" 2013
 2 "A10E" 2004
 2 "A10E" 2005
 2 "A10E" 2006
 2 "A10E" 2007
 2 "A10E" 2008
 2 "A10E" 2009
 2 "A10E" 2010
 2 "A10E" 2011
 2 "A10E" 2012
 2 "A10E" 2013
 3 "A10H" 2004
 3 "A10H" 2005
 3 "A10H" 2006
 3 "A10H" 2007
 3 "A10H" 2008
 3 "A10H" 2009
 3 "A10H" 2010
 3 "A10H" 2011
 3 "A10H" 2012
 3 "A10H" 2013
 4 "A10X" 2004
 4 "A10X" 2005
 4 "A10X" 2006
 4 "A10X" 2007
 4 "A10X" 2008
 4 "A10X" 2009
 4 "A10X" 2010
 4 "A10X" 2011
 4 "A10X" 2012
 4 "A10X" 2013
 5 "A11A" 2004
 5 "A11A" 2005
 5 "A11A" 2006
 5 "A11A" 2007
 5 "A11A" 2008
 5 "A11A" 2009
 5 "A11A" 2010
 5 "A11A" 2011
 5 "A11A" 2012
 5 "A11A" 2013
 6 "A11B" 2004
 6 "A11B" 2005
 6 "A11B" 2006
 6 "A11B" 2007
 6 "A11B" 2008
 6 "A11B" 2009
 6 "A11B" 2010
 6 "A11B" 2011
 6 "A11B" 2012
 6 "A11B" 2013
 7 "A11E" 2004
 7 "A11E" 2005
 7 "A11E" 2006
 7 "A11E" 2007
 7 "A11E" 2008
 7 "A11E" 2009
 7 "A11E" 2010
 7 "A11E" 2011
 7 "A11E" 2012
 7 "A11E" 2013
 8 "A11F" 2004
 8 "A11F" 2005
 8 "A11F" 2006
 8 "A11F" 2007
 8 "A11F" 2008
 8 "A11F" 2009
 8 "A11F" 2010
 8 "A11F" 2011
 8 "A11F" 2012
 8 "A11F" 2013
 9 "A11G" 2004
 9 "A11G" 2005
 9 "A11G" 2006
 9 "A11G" 2007
 9 "A11G" 2008
 9 "A11G" 2009
 9 "A11G" 2010
 9 "A11G" 2011
 9 "A11G" 2012
 9 "A11G" 2013
10 "A11X" 2004
10 "A11X" 2005
10 "A11X" 2006
10 "A11X" 2007
10 "A11X" 2008
10 "A11X" 2009
10 "A11X" 2010
10 "A11X" 2011
10 "A11X" 2012
10 "A11X" 2013
end

I would like to add to this equation:

Code:
 xtreg y recalls_normalized lag_recalls_norm nprod_squared outflow_rate nprod numero_imprese mean_agefirm_byatc mean_agefirm_squared hhi share_generics_1 i.Year average_age_prodbyatc3 avg_ageprod_sq, fe vce(cluster idatc3)
the lag of the dependent y, making it a dynamic panel. Now, I imagine that I should include year dummies only on the difference equation. My doubts are the following:

(i) how can I specify that only time dummies should be used as instrument only in the level equation while other variables (I think) should be used as instruments in both level and differences equation?
(ii) I am guessing that the difference equation does the same job as fixed effects here right? Hence no option fe is included
(iii) supposing that all but the lagged dependent variable (L.y) are exogenous, should I perform something like this:

Code:
xtabond2 y L.y recalls_normalized lag_recalls_norm nprod_squared outflow_rate nprod numero_imprese mean_agefirm_byatc mean_agefirm_squared hhi share_generics_1 i.Year average_age_prodbyatc3 avg_ageprod_sq, gmm(L.(y)) iv(recalls_normalized lag_recalls_norm nprod_squared outflow_rate nprod numero_imprese mean_agefirm_byatc mean_agefirm_squared hhi share_generics_1 average_age_prodbyatc3 avg_ageprod_sq) iv(i.Year, equation(level)) robust small
(iv) what iii some over identifying restrictions are detected? Should I remove some lags? I mean, is it just a trial and error or the over identifying restrictions are automatically removed from xtabond2 command?

Thank you,

Federico