Sample 1 contains a variable of interest that is not available in sample 2. My solution is to impute values for the variable missing in sample 2 using estimates obtained from sample 1 for the variables common to both samples. For example, suppose sample 1 contains variables X Y Z and sample 2 contains W Y Z. My proposed procedure is thus:
1. From sample 1 regress X on Y and Z using OLS to obtain the marginal effects of Y and Z on X.
2. Use the marginal effects obtained from step 1 to generate predicted values X_hat in sample 2.
3. From sample 2 regress W on X_hat using fixed effects.
Since X_hat is a generated regressor I am attempting to bootstrap the standard errors for the estimates obtained from step 3.
To illustrate I have constructed the following 2 samples from Stata's 'auto' dataset.
Sample 1: X = weight, Y = mpg, Z = headroom
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int(weight mpg) float headroom 4330 14 4 3900 14 3.5 4290 21 3 2110 29 2.5 3690 16 4 3180 22 3.5 3220 22 2 2750 24 2 3430 19 3.5 2120 30 2 3600 18 4 3600 16 4 3740 17 4.5 1800 28 1.5 2650 21 2 4840 12 3.5 4720 12 2.5 3830 14 3.5 2580 22 3 4060 14 3.5 3720 15 3.5 3370 18 3 4130 14 3 2830 20 3.5 4060 21 4 3310 19 2 3300 19 4.5 3690 18 4 3370 19 4.5 2730 24 2 4030 16 3.5 3260 28 2 1800 34 2.5 2200 25 4 2520 26 1.5 3330 18 5 3700 18 4 3470 18 1.5 3210 19 2 3200 19 3.5 3420 19 3.5 2690 24 2 2830 17 3 2070 23 2.5 2650 25 2.5 2370 23 1.5 2020 35 2 2280 24 2.5 2750 21 2.5 2130 21 2.5 2240 25 3 1760 28 2.5 1980 30 3.5 3420 14 3.5 1830 26 3 2050 35 2.5 2410 18 2.5 2200 31 3 2670 18 2 2160 23 2.5 2040 41 3 1930 25 3 1990 25 2 3170 17 2.5 end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(id t price mpg headroom) 1 1 4099 22 2.5 1 2 4194.3037 26.966696 3.341133 2 1 4749 17 3 2 2 4754.272 16.204895 3.812516 3 1 3799 22 3 3 2 3813.818 24.62638 3.474947 4 1 4816 20 4.5 4 2 4755.9487 17.92939 4.2269 5 1 7827 15 4 5 2 7899.692 18.484156 4.0665107 6 1 5788 18 4 6 2 5788.18 14.68257 3.4874845 7 1 4453 26 3 7 2 4367.883 26.290855 2.0571663 8 1 5189 20 2 8 2 5236.714 17.27138 2.454724 9 1 10372 16 3.5 9 2 10386.5 16.689259 2.510926 10 1 4082 19 3.5 10 2 4024.6116 14.151664 4.370308 end
Code:
use sample1.dta, replace capture program drop example program define example, eclass qui reg weight mpg headroom use sample2.dta, replace capture drop weight_hat predict weight_hat xtset newid t xtreg price weight_hat, fe exit end xtset, clear bootstrap, reps(10) seed(1) cluster(id) idcluster(newid): example
variable id not found
(error in option cluster())
(error in option cluster())
I am not sure how to resolve this issue, or if perhaps there is a better way of doing it. Maybe Any help would be greatly appreciated.
0 Response to Bootstrapping standard errors for two-stage program combining cross-sectional and panel data
Post a Comment