Hello everyone!
I am planning to use Semykina and Wooldridge Estimator (2010) for my thesis as my sample consists of unobserved heterogeneity, endogeneity and sample selection too. However, I am unsure about the codes to be used in STATA.
I tried to implement the model as follows:
Stage I- to account for participation in an activity, in my case
probit y x xbar , vce (cluster id)
where xbar is the mean of the explanatory variables in stage one generated by the command - by id: egen xbar = mean(x)
Calculate inverse mills ratio (imr) from stage one for each year.
Stage II
z is the outcome variable where impact of hours(h) in the activity is to be estimated. In my case, z affects h and h affects z indicating endogeneity (reverse causality).
what i understood is we have to use:
ivregress 2sls z x' (h= x'bar imr) if z>0
where x' is a subset of x.
I would be grateful if anyone could confirm if the codes are correct.