I have a research project involving country-level panel data where I want to model for uncertainty of the outcome variable and the independent variable of interest, each of which is a latent variable. These latent variables (constructed by other researchers) contain composite values derived from multiple other variables, not all of which were available for any given country and/or year. Each country-year observation of the latent variables is accompanied in an adjacent column by the standard deviation of the posterior distribution of the latent variable for each country-year observation.
I wish to follow the technique for modeling for uncertainty of the latent variables suggested by Charles D. Crabtree and Christopher J. Fariss in a 2015 article in Research and Politics (July-September, pp.1-9) titled “Uncovering Patterns Among Latent Variables: Human Rights and De Facto Judicial Independence.” As described by the authors, they: (1) duplicate their dataset 1,000 times; (2) assign a random draw from the posterior distribution of each latent variable to each country-year observation; (3) use each value thus obtained as new values for each country-year value of the latent outcome variable and independent variable of interest and estimate a set of 1,000 regression models; and then (4) combine the results across the multiple sets of data to create one set of coefficient and standard error estimates. Crabtree and Fariss used R, I want to use Stata.
Last year (May 9, 2019) I sought help from the Stata Forum on how to do this and received a solution from Joseph Coveney that works fine when using xtreg. I am now trying to do the same thing using xtscc (Driscoll-Kraay) and xtabond2 (GMM Arellano-Bond), but the Do-file that works so well for xtreg does not work correctly for either xtscc or xtabond2. When I run the program for the former, I get an error message telling me that I need to add years to my tsset (xtset). But when I add years, I get an error message telling me that there are repeated years in my data. For xtabond2, I get much the same result, except it doesn't tell me to add years, it just doesn't complete the process and gives me error message No. r(459). And when I add years, I get the same error message telling me that there are repeated years in my data. However, I can get both xtscc and xtabond2 to work if I add years AND limit the number of regression runs to one.
I have included below both my test data and three Do-file versions. The first is the xtreg Do-file that works for 100 regressions. My goal, is 1000, but I limited it here for simplicity. The second is the xtscc Do-file, configured so that it works for one regression only, as I can't get it to work for 100 regressions. The third is the xtabond2 Do-file, also configured so that it works for one regression only.
I would appreciate any advice on how to modify my xtscc and xtabond2 Do-files so that they work for 100 regressions.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(i_code year) float(dep_var dep_var_post_sd ind_var ind_var_post_sd control_var1 control_var2) 1 2005 1.33 .27 .9403 .0377 .03395293 10.098463 1 2006 1.09 .27 .9403 .0369 .02516864 10.103597 1 2007 1.25 .26 .943 .0366 .01446522 10.09891 1 2008 1.19 .25 .9433 .0371 -.02323937 10.057076 1 2009 .78 .19 .9417 .0386 -.04175259 9.996817 2 2005 -1.69 .22 .1994 .0522 .01805011 6.534809 2 2006 -1.47 .18 .2093 .0571 .02249222 6.541407 2 2007 -1.4 .17 .2155 .0612 .03343279 6.558741 2 2008 -1.32 .17 .2183 .0636 .00843944 6.55176 2 2009 -1.35 .16 .2199 .0673 .03083248 6.56701 3 2005 -.68 .16 .4752 .0442 .0926275 8.366809 3 2006 -.58 .15 .4738 .0426 .10671155 8.453825 3 2007 -.62 .15 .4716 .0432 .08710414 8.52325 3 2008 -.67 .15 .4692 .0441 .03209504 8.541031 3 2009 -.76 .14 .4736 .0468 .00946155 8.536921 4 2005 .68 .32 .9646 .0285 .06083228 9.514385 4 2006 1 .34 .9617 .0318 .133764 9.629064 4 2007 .98 .33 .9589 .0339 .09498945 9.708728 4 2008 .97 .32 .9552 .0365 .00071111 9.69821 4 2009 1 .34 .9507 .0403 -.12036015 9.558898 end * Begin here * quietly expand 100 bysort i_code year: generate int dataset = _n // Create new "draw" variables here: generate double new_dep_var = rnormal(dep_var, dep_var_post_sd) generate double new_dep_var_l1= new_dep_var[_n-1] generate double new_ind_var = rnormal(ind_var, ind_var_post_sd) // And now the hundred regressions, one for each of the hundred datasets: tempname file_handle tempfile hundred_regressions postfile `file_handle' int dataset double intercept double l_dvslope double l_dvse double ivslope double ivSE double cv1slope double cv1SE double cv2slope double cv2SE using `hundred_regressions' xtset i_code forvalues dataset = 1/100 { quietly xtreg new_dep_var new_dep_var_l1 new_ind_var control_var1 control_var2 if year>2005 & dataset == `dataset', fe vce(cluster i_code) post `file_handle' (`dataset') (_b[_cons]) (_b[new_dep_var_l1]) (_se[new_dep_var_l1]) (_b[new_ind_var]) (_se[new_ind_var]) (_b[control_var1]) (_se[control_var1]) (_b[control_var2]) (_se[control_var2]) } postclose `file_handle' use `hundred_regressions', clear exit *Alternative Driscoll-Kraay Fixed Effects *Begin here quietly expand 1 bysort i_code year: generate int dataset = _n // Create new "draw" variables here: generate double new_dep_var = rnormal(dep_var, dep_var_post_sd) generate double new_dep_var_l1= new_dep_var[_n-1] generate double new_ind_var = rnormal(ind_var, ind_var_post_sd) // And now the hundred regressions, one for each of the hundred datasets: tempname file_handle tempfile hundred_regressions postfile `file_handle' int dataset double intercept double l_dvslope double l_dvse double ivslope double ivSE double cv1slope double cv1SE double cv2slope double cv2SE using `hundred_regressions' xtset i_code year forvalues dataset = 1/1 { quietly xtscc new_dep_var new_dep_var_l1 new_ind_var control_var1 control_var2 if year>2005 & dataset == `dataset', fe post `file_handle' (`dataset') (_b[_cons]) (_b[new_dep_var_l1]) (_se[new_dep_var_l1]) (_b[new_ind_var]) (_se[new_ind_var]) (_b[control_var1]) (_se[control_var1]) (_b[control_var2]) (_se[control_var2]) } postclose `file_handle' use `hundred_regressions', clear exit * Alternative GMM Arellano-Bond * Begin here * quietly expand 1 bysort i_code year: generate int dataset = _n // Create new "draw" variables here: generate double new_dep_var = rnormal(dep_var, dep_var_post_sd) generate double new_dep_var_l1= new_dep_var[_n-1] generate double new_ind_var = rnormal(ind_var, ind_var_post_sd) // And now the hundred regressions, one for each of the hundred datasets: tempname file_handle tempfile hundred_regressions postfile `file_handle' int dataset double intercept double l_dvslope double l_dvse double ivslope double ivSE double cv1slope double cv1SE double cv2slope double cv2SE using `hundred_regressions' xtset i_code forvalues dataset = 1/1 { quietly xtabond2 new_dep_var new_dep_var_l1 new_ind_var control_var1 control_var2 if year>2005 & dataset == `dataset', two or gmm((new_dep_var_l1), collapse lag(2 .)) gmm((new_ind_var), collapse lag(2 .)) post `file_handle' (`dataset') (_b[_cons]) (_b[new_dep_var_l1]) (_se[new_dep_var_l1]) (_b[new_ind_var]) (_se[new_ind_var]) (_b[control_var1]) (_se[control_var1]) (_b[control_var2]) (_se[control_var2]) } postclose `file_handle' use `hundred_regressions', clear exit
0 Response to Want to Know How to Model for Uncertainty of Latent Variables in Country-Level Data Using xtscc and xtabond2
Post a Comment