I'm trying to estimate the union wage premium. There are two potential issues I need to tackle:

1. selection bias
2. endogeneity (between wage and union)

Here is what I wrote:

Code:
 
global seleq "density educ age agesq sex marital i.ind i.occ i.region i.time"
global wageeq "educ age agesq sex marital i.ind i.occ i.region i.time"

probit union $seleq
predict xb, xb
gen double imr = normalden(xb)/normal(xb) 

xi: reg lwage union imr $wageeq, r
Question 1: Does the coefficient of the union variable (after the correction) give me an unbiased estimate? Or should I do the following instead?

Code:
 
ivreg2 lwage $wageeq imr (union=$seleq), first robust
-------------------------------------------------

Question 2: why does the union variable get dropped when I do this:

Code:
 
heckman lwage union $wageeq, select(union=$seleq) twostep first mills(imr)

Thank you very much!!