I have two samples from different populations from which I am conducting a two-stage estimation procedure. These samples share two common variables but have different structures: sample 1 is cross-sectional, sample 2 is a panel from which I am conducting my main analysis.

Sample 1 contains a variable of interest that is not available in sample 2. My solution is to impute values for the variable missing in sample 2 using estimates obtained from sample 1 for the variables common to both samples. For example, suppose sample 1 contains variables X Y Z and sample 2 contains W Y Z. My proposed procedure is thus:

1. From sample 1 regress X on Y and Z using OLS to obtain the marginal effects of Y and Z on X.
2. Use the marginal effects obtained from step 1 to generate predicted values X_hat in sample 2.
3. From sample 2 regress W on X_hat using fixed effects.

Since X_hat is a generated regressor I am attempting to bootstrap the standard errors for the estimates obtained from step 3.

To illustrate I have constructed the following 2 samples from Stata's 'auto' dataset.

Sample 1: X = weight, Y = mpg, Z = headroom

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input int(weight mpg) float headroom
4330 14   4
3900 14 3.5
4290 21   3
2110 29 2.5
3690 16   4
3180 22 3.5
3220 22   2
2750 24   2
3430 19 3.5
2120 30   2
3600 18   4
3600 16   4
3740 17 4.5
1800 28 1.5
2650 21   2
4840 12 3.5
4720 12 2.5
3830 14 3.5
2580 22   3
4060 14 3.5
3720 15 3.5
3370 18   3
4130 14   3
2830 20 3.5
4060 21   4
3310 19   2
3300 19 4.5
3690 18   4
3370 19 4.5
2730 24   2
4030 16 3.5
3260 28   2
1800 34 2.5
2200 25   4
2520 26 1.5
3330 18   5
3700 18   4
3470 18 1.5
3210 19   2
3200 19 3.5
3420 19 3.5
2690 24   2
2830 17   3
2070 23 2.5
2650 25 2.5
2370 23 1.5
2020 35   2
2280 24 2.5
2750 21 2.5
2130 21 2.5
2240 25   3
1760 28 2.5
1980 30 3.5
3420 14 3.5
1830 26   3
2050 35 2.5
2410 18 2.5
2200 31   3
2670 18   2
2160 23 2.5
2040 41   3
1930 25   3
1990 25   2
3170 17 2.5
end
Sample 2: W = weight

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input float(id t price mpg headroom)
 1 1      4099        22       2.5
 1 2 4194.3037 26.966696  3.341133
 2 1      4749        17         3
 2 2  4754.272 16.204895  3.812516
 3 1      3799        22         3
 3 2  3813.818  24.62638  3.474947
 4 1      4816        20       4.5
 4 2 4755.9487  17.92939    4.2269
 5 1      7827        15         4
 5 2  7899.692 18.484156 4.0665107
 6 1      5788        18         4
 6 2   5788.18  14.68257 3.4874845
 7 1      4453        26         3
 7 2  4367.883 26.290855 2.0571663
 8 1      5189        20         2
 8 2  5236.714  17.27138  2.454724
 9 1     10372        16       3.5
 9 2   10386.5 16.689259  2.510926
10 1      4082        19       3.5
10 2 4024.6116 14.151664  4.370308
end
My proposed bootstrapping procedure is then:

Code:
use sample1.dta, replace
capture program drop example
program define example, eclass

    qui reg weight mpg headroom
    
    use sample2.dta, replace
    capture drop weight_hat
    predict weight_hat
    
    xtset newid t
    xtreg price weight_hat, fe
    
    exit
end
xtset, clear
bootstrap, reps(10) seed(1) cluster(id) idcluster(newid): example
When I run this I get the error:

variable id not found
(error in option cluster())
This is presumably because sample 1 does not contain the variable 'id.'

I am not sure how to resolve this issue, or if perhaps there is a better way of doing it. Maybe Any help would be greatly appreciated.