Dear Statalisters,

I have a panel data set which suffers from sample selection bias. I am following the approach of Wooldridge (1995) and Semykina and Wooldridge (2010). The approach applied in order to correct for sample selection bias is to:
1. Estimate T different probits of the selection equation in ordet to retrieve T inverse mills ratios

select=x1 z1

I do this using a loop:
HTML Code:
forvalues i = 2005(1)2016  {
disp `i'
probit select `x1' `z1' if yy==`i', vce(robust)
predict xb`i' if yy==`i', xb 
qui replace IMR=normalden(xb`i')/normal(xb`i') if `y2'==1 & yy==`i'
}
2. For all observations in the sample (i.e., for those where select=1), use pooled OLS to estimate the main equation:

reg food2 `x1' IMR i.yy if select==1, vce(cluster mc)

3. Estimate the asymptotic variance

Now, I am struggling with step 3. In their paper from 2010, Semykina and Wooldridge write on p. 378 "Instead of using analytical formulae for the asymptotic variance, one can apply "panel bootstrap". This involves resampling cross-sectional units (and all time periods for each unit sampled) and using the bootstrap sample to approximate the distribution of the parameter vector "

It is my impression that with a two-step estimator, one should bootstrap over both (in this case) the probit(s) and the second stage main regression. However, I have not found any clues on Statalist or elsewhere on how to bootstrap standard errors, accounting for two separate regressions, when the first step involves estimating T different probits. I assume it would be something similar in style to the "program" approach in https://www.statalist.org/forums/for...ction-on-stata , but taking into account the T different probits in the first stage.
I.e.,

HTML Code:
* Bootstrap SE
program heck2, rclass

forvalues i = 2005(1)2016  {
disp `i'
probit select `x1' `z1' if yy==`i', vce(robust)
predict xb`i' if yy==`i', xb 
qui replace IMR=normalden(xb`i')/normal(xb`i') if `y2'==1 & yy==`i'
}

xtset mc yy
qui reg food2 x1 i.yy, vce(cluster mc)

return scalar beta = _b[L1_goal] 

end

bootstrap r(beta), reps(100) seed(1234) nodrop:heck2
estat bootstrap
Which renders the following error message:

HTML Code:
. bootstrap r(beta), reps(100) seed(1234) nodrop:heck2
(running heck2 on estimation sample)
varlist required
an error occurred when bootstrap executed heck2

Is there anyone that has used the "panel bootstrap" method that Semykina and Wooldridge (2010) refers to, and in this context? If so, I would be grateful to recieve any suggestions on how this could be implemented using Stata code.

I am aware that my question does not pertain to a particular Stata command, so I completely understand if this is the wrong forum for it.

/Hanna

References:
Wooldridge, Jeffrey M. "Selection corrections for panel data models under conditional mean independence assumptions." Journal of econometrics 68.1 (1995): 115-132.
Semykina, Anastasia, and Jeffrey M. Wooldridge. "Estimating panel data models in the presence of endogeneity and selection." Journal of Econometrics 157.2 (2010): 375-380.