Every year, the data for one of the explanatory variables in my model, which happens to be endogenous (y2), is only available for first 100 firms in terms of their revenues (rev).
The firms in top 100 list based on rev are generally the same for all years expect for some of these that may fall below the threshold in one year, thus will be dropped for that year, and are included back when they are among the top 100 again.
120 unique firms appear in at least one of the years covered in the study. These firms represent the sample due to the availability of the data on y2. Therefore, firms that never appeared in the top 100 in any year are excluded.
The data for all the variables (y1 and the exogenous explanatory variables) are available for all years irrespective of whether or not the firm was dropped in a particular year. However, y2 data is only available if the firm is among the 100 list for a that year.
The dependent variable y1 is continuous and is normally distributed (as the descriptive stats show).
Example
Year 1 | Y1 | Y2 | X1 |
2010 | 231 | 12312 | 3123 |
2011 | 1231 | . | 1323 |
2012 | 3213 | . | 1312 |
2013 | 33213 | 13123 | . |
The missing value for x1 in 2013 indicates that in some cases there are some missing values for other variables.
Given this selection context, is it appropriate to apply Heckman two-stage procedure for 2SLS estimator following Wooldridge (2010) Given that I do not observe any firm beyond the 120 firms selected as discussed earlier?
Y1: structural equation
Y2: linear projection
Y3: selection indicator ......
0 Response to Heckman - Wooldridge - selection model for for 2SLS estimator - complex context
Post a Comment