Dear Statalists,
I am confused about how to correct for selection on one independent variable.

I want to estimate Y_ft=beta*Certified_ft+Z_ft in a sample from 2000-2013. Here f, t indicate firm and year respectively. Z are exogenous variables. Certified_ft means whether the firm gets a certification in year t. However, Certified_ft is only observed for firms survived in 2018. (I combined two datasets: one reports Y from 2000-2013; the other reports when firms got certified for firms survived in 2018)

So I face two selection issues: 2) the endogeneity of Certified_ft: factors that affect Certified and Y at the same time. I developed an instrument z1 for it; 2) the survivor bias: I only observe Certified_ft for firms survived in 2018. I wondered how to correct for these biases. I thought of two possibilities:

1) Semykina & Wooldrige (2010) corrected for endogeneity and selection. It is similar to Heckman two-stage method. However, it applied to selection on dependent variables, rather than independent variables.
2) Control function approach in Imbens & Wooldrige (2007) (page 4). First estimate a probit model of Prob(Certified_ft) on instruments z2 (hoping to correct for the survivor bias), obtain its predicted probabilities p2, then estimate Y on Certified, Z and p2, probably using 2SLS (with z1 as the instrument for Certified).

It is a little complicated as I face two layers of selection. Do you think method 2 can help me address this problem? Or what approach else would you recommend?
Any comments would be appreciated! Thank you.
Best,
K