Dear Statalists,

I am using Stata 15, the eregress package and the cmp package. The following is the problem that I am facing.

Let's say:

Y: Wage, continuous
T: Whether you are treated, binary
D: IV for T
X: control
Z: Whether you work

My data set looks like the following:
# Y T Z
1 15 1 1
2 14 1 0
3 16 1 missing
4 17 0 1
5 18 0 0
6 19 0 missing
7 88 missing 1
8 5 missing 0
9 missing 1 0

My naive regression is : reg Y on T, X

The problem is : wages are only observed when the individual is on the labor market or when the Z variable takes the value of 1. So I want to do a Heckman selection like the following:

outcome equation: reg Y on T, X
selection equation: reg Z on T, X

Now the problem is T is also endogenous to Z, therefore I want to use the IV - D variable in the outcome equation, but not in the selection equation. At the current stage, I have 2 ways to go.

Way one - cmp package

code:
cmp (wage = T X) (selectvar = T_endo X ) (T = IV) , ind(selectvar*$cmp_cont $cmp_probit $cmp_cont)
where selectvar is generated by the following command: gen selectvar = wage<. (this follows the logic of the example in the cmp manual) T_endo is the variable I created to replace T, in order not to be instrumented by the IV in the thrid equation.

My questions are :
(1) does my code make sense in terms of what I want and what I have? since in this way, I didn't use my Z variable.

Way two - eregress

code:
eregress Y, entreat(T = IV) select(selectvar = T_endo X)
where selectvar is generated by the following command: gen selectvar = wage<.

My questions are:
(1) The code gives me the error message : can't find initial value. Can anyone help me sovle this error?
(2) does my code make sense in terms of what I want and what I have?

What would be my next try? Thank you all for the time and effort reading my post. I appreciate it.

Best
Xu