I am trying to find a way to build a two-stage model where the first stage of the model is a binary dependent variable. I have chosen the probit model to estimate this.
My setup:
Xit = Φ( Wit + eit ), where Φ is the Normal cumulative distribution function
Yit = αZit + βXit + uit, where Z may also contain elements of W
I realize this subject has been discussed ad-nauseum in this forum, but it is hard to collect a single recommendation. First, let me link to relevant articles and my question(s) will follow.
1. 2SLS with Binary Endogenous Variable and linear second stage:
https://www.statalist.org/forums/for...enous-variable
- Recommended solution is to use either 2SLS in both stages (which ignores the fact that X is binary) or ..
- Use solution from Wooldridge (2002, 2010) which is a 3 step process: probit, then do 2SLS while using predicted values (from the probit model) as an instrument for the first stage
- This process is also discussed here https://www.statalist.org/forums/for...-in-panel-data
- There is a bit more detail here recommending the use of xtivreg
- I assume either version of 2SLS is appropriate, depending on data type (ivreg for cross-sections or xtivreg for panels)
- 2SLS is consistent in both cases, though you lose some precision in the first case as it ignores the binary nature of X
- Recommended solutions is to use either 2SLS (again) ... this is what Angrist and Pischke recommend in "Mostly Harmless" or...
- Use biprobit to joint estimate both maximum likelihood models
- Wooldridge notes in that post: "A method that plugs in fitted values into nonlinear second stages should be assumed inconsistent unless you prove otherwise."
- We cannot use probit model as it's own first stage because " neither the conditional expectation nor the linear projection operator passes through nonlinear functions" as discussed in Wooldridge (2010, p267).
- Despite this fact, we also have access to the etregress command in Stata. This was first mentioned in link in #1 above.
- Also mentioned here in a Statalist archive: https://www.stata.com/statalist/arch.../msg00339.html
- etregress gives the option to use MLE, two-step estimation, or a control function approach (as of Stata 14, I think)
Ok, with this information out there (and countless other posts that I read through, here are my questions:
- People commonly refer to the procedure in Wooldridge (2010), but I cannot find an explicit page number reference to this procedure in the 2010 version. In Section 9.5.2 on page 268, there is a similar discussion regarding a squared first-stage covariate, but not a binary first-stage covariate... but perhaps this is what everyone is referring to? I have combed over the book and cannot seem to find it in the right place. Can someone provide me the exact reference to this procedure so I can correctly cite?
- Similarly, is there a parallel discussion for this procedure for panel data in the book? The context on p268 is cross-sectional.
- Given Wooldridge's comments about the difficulties surrounding a CF approach with a non-linear model, how do I trust the outputs of etregress if I select the CF option?
Best regards,
RJ
0 Response to 2SLS Panel Data Regression with endogenous First-Stage Binary Variable
Post a Comment