Hello,

I have a research project involving country-level panel data where I want to model for uncertainty of the outcome variable and the independent variable of interest, each of which is a latent variable. These latent variables (constructed by other researchers) contain composite values derived from multiple other variables, not all of which were available for any given country and/or year. Each country-year observation of the latent variables is accompanied in an adjacent column by the standard deviation of the posterior distribution of the latent variable for each country-year observation.

I wish to follow the technique for modeling for uncertainty of the latent variables suggested by Charles D. Crabtree and Christopher J. Fariss in a 2015 article in Research and Politics (July-September, pp.1-9) titled “Uncovering Patterns Among Latent Variables: Human Rights and De Facto Judicial Independence.” As described by the authors, they: (1) duplicate their dataset 1,000 times; (2) assign a random draw from the posterior distribution of each latent variable to each country-year observation; (3) use each value thus obtained as new values for each country-year value of the latent outcome variable and independent variable of interest and estimate a set of 1,000 regression models; and then (4) combine the results across the multiple sets of data to create one set of coefficient and standard error estimates. Crabtree and Fariss used R, I want to use Stata.

Last year (May 9, 2019) I sought help from the Stata Forum on how to do this and received a solution from Joseph Coveney that works fine when using xtreg. I am now trying to do the same thing using xtscc (Driscoll-Kraay) and xtabond2 (GMM Arellano-Bond), but the Do-file that works so well for xtreg does not work correctly for either xtscc or xtabond2. When I run the program for the former, I get an error message telling me that I need to add years to my tsset (xtset). But when I add years, I get an error message telling me that there are repeated years in my data. For xtabond2, I get much the same result, except it doesn't tell me to add years, it just doesn't complete the process and gives me error message No. r(459). And when I add years, I get the same error message telling me that there are repeated years in my data. However, I can get both xtscc and xtabond2 to work if I add years AND limit the number of regression runs to one.

I have included below both my test data and three Do-file versions. The first is the xtreg Do-file that works for 100 regressions. My goal, is 1000, but I limited it here for simplicity. The second is the xtscc Do-file, configured so that it works for one regression only, as I can't get it to work for 100 regressions. The third is the xtabond2 Do-file, also configured so that it works for one regression only.

I would appreciate any advice on how to modify my xtscc and xtabond2 Do-files so that they work for 100 regressions.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(i_code year) float(dep_var dep_var_post_sd ind_var ind_var_post_sd control_var1 control_var2)
1 2005  1.33 .27 .9403 .0377  .03395293 10.098463
1 2006  1.09 .27 .9403 .0369  .02516864 10.103597
1 2007  1.25 .26  .943 .0366  .01446522  10.09891
1 2008  1.19 .25 .9433 .0371 -.02323937 10.057076
1 2009   .78 .19 .9417 .0386 -.04175259  9.996817
2 2005 -1.69 .22 .1994 .0522  .01805011  6.534809
2 2006 -1.47 .18 .2093 .0571  .02249222  6.541407
2 2007  -1.4 .17 .2155 .0612  .03343279  6.558741
2 2008 -1.32 .17 .2183 .0636  .00843944   6.55176
2 2009 -1.35 .16 .2199 .0673  .03083248   6.56701
3 2005  -.68 .16 .4752 .0442   .0926275  8.366809
3 2006  -.58 .15 .4738 .0426  .10671155  8.453825
3 2007  -.62 .15 .4716 .0432  .08710414   8.52325
3 2008  -.67 .15 .4692 .0441  .03209504  8.541031
3 2009  -.76 .14 .4736 .0468  .00946155  8.536921
4 2005   .68 .32 .9646 .0285  .06083228  9.514385
4 2006     1 .34 .9617 .0318    .133764  9.629064
4 2007   .98 .33 .9589 .0339  .09498945  9.708728
4 2008   .97 .32 .9552 .0365  .00071111   9.69821
4 2009     1 .34 .9507 .0403 -.12036015  9.558898
end

* Begin here
*
quietly expand 100
bysort i_code year: generate int dataset = _n

// Create new "draw" variables here:
generate double new_dep_var = rnormal(dep_var, dep_var_post_sd)
generate double new_dep_var_l1= new_dep_var[_n-1]
generate double new_ind_var = rnormal(ind_var, ind_var_post_sd)

// And now the hundred regressions, one for each of the hundred datasets:
tempname file_handle
tempfile hundred_regressions
postfile `file_handle' int dataset double intercept double l_dvslope double l_dvse double ivslope double ivSE double cv1slope double cv1SE double cv2slope double cv2SE    using `hundred_regressions'

xtset i_code 
forvalues dataset = 1/100 {
  quietly   xtreg  new_dep_var new_dep_var_l1  new_ind_var control_var1 control_var2 if year>2005 & dataset == `dataset', fe vce(cluster i_code) 
    
    post `file_handle' (`dataset') (_b[_cons]) (_b[new_dep_var_l1]) (_se[new_dep_var_l1]) (_b[new_ind_var]) (_se[new_ind_var]) (_b[control_var1]) (_se[control_var1]) (_b[control_var2]) (_se[control_var2])
     
}
postclose `file_handle'
use `hundred_regressions', clear



exit



*Alternative Driscoll-Kraay Fixed Effects
*Begin here

quietly expand 1
bysort i_code year: generate int dataset = _n

// Create new "draw" variables here:
generate double new_dep_var = rnormal(dep_var, dep_var_post_sd)
generate double new_dep_var_l1= new_dep_var[_n-1]
generate double new_ind_var = rnormal(ind_var, ind_var_post_sd)

// And now the hundred regressions, one for each of the hundred datasets:
tempname file_handle
tempfile hundred_regressions
postfile `file_handle' int dataset double intercept double l_dvslope double l_dvse double ivslope double ivSE double cv1slope double cv1SE double cv2slope double cv2SE    using `hundred_regressions'

xtset i_code year 
forvalues dataset = 1/1 {
  quietly   xtscc  new_dep_var new_dep_var_l1  new_ind_var control_var1 control_var2 if year>2005 & dataset == `dataset', fe 
    
    post `file_handle' (`dataset') (_b[_cons]) (_b[new_dep_var_l1]) (_se[new_dep_var_l1]) (_b[new_ind_var]) (_se[new_ind_var]) (_b[control_var1]) (_se[control_var1]) (_b[control_var2]) (_se[control_var2])
     
}
postclose `file_handle'
use `hundred_regressions', clear

exit


* Alternative GMM Arellano-Bond
* Begin here
*
quietly expand 1
bysort i_code year: generate int dataset = _n

// Create new "draw" variables here:
generate double new_dep_var = rnormal(dep_var, dep_var_post_sd)
generate double new_dep_var_l1= new_dep_var[_n-1]
generate double new_ind_var = rnormal(ind_var, ind_var_post_sd)

// And now the hundred regressions, one for each of the hundred datasets:
tempname file_handle
tempfile hundred_regressions
postfile `file_handle' int dataset double intercept double l_dvslope double l_dvse double ivslope double ivSE double cv1slope double cv1SE double cv2slope double cv2SE    using `hundred_regressions'

xtset i_code 
forvalues dataset = 1/1 {
   quietly   xtabond2  new_dep_var new_dep_var_l1  new_ind_var control_var1 control_var2 if year>2005 & dataset == `dataset', two or  gmm((new_dep_var_l1), collapse lag(2 .)) gmm((new_ind_var), collapse lag(2 .)) 
    
    post `file_handle' (`dataset') (_b[_cons]) (_b[new_dep_var_l1]) (_se[new_dep_var_l1]) (_b[new_ind_var]) (_se[new_ind_var]) (_b[control_var1]) (_se[control_var1]) (_b[control_var2]) (_se[control_var2])
     
}
postclose `file_handle'
use `hundred_regressions', clear



exit