Hi everyone,

I'm studying "the impact of credit access on the performance of small and medium enterprises" and I have an unbalanced panel data with T=3 (2011,2013,2015). I realize that there exist 2 issues in my model: the endogeneity of credit access and selection bias. My main concern is selection bias.

Structural equation: Y = a1CreditAccess + a2X + u1 (where X include a set of control variables)
Reduced form equation for endogenous variable: CreditAccess = b1Z + u2

Some papers suggest the correction for the selection bias as follows:
1. Estimate selection equation: si = 1[Za + u3 >= 0]
si =1 if y is observed and 0 otherwise. We assume that Z is always observed.
2. Then caculate the Inverse Mill Ratio (IMR)
3. Estimate:
Y = a1CreditAccess + a2X + a3IMR + e1
by 2SLS, with instruments (z, IMR).

I have some following questions:
1. I read an example of estimating wage offer. Unlike wage example, we can not observe firm characteristics Z after firms leave the market or before firms enter the market. So can we still apply this method?
2. Some documents suggest that I need one instrument for the endogenous variable and at least one exogenous variable for determining selection. Because of (1), which is a suitable variable for determining selection?
3. How to create the dummy variable si to unbalance panel data by Stata? Array




I hope that someone facing a similar problem can give me advice. Thanks in advance!