Greetings everyone,
I specify two models in my study with two different dependent variables; one of them is a 0-1 dummy (y1), and the other is a count variable (y2). For each model, the independent variable of interest is a count variable (w), which is potentially endogenous. Thus, in my situation, I encounter two cases: 1) a logit model with a count endogenous explanatory variable and 2) a negative binomial model with a count endogenous explanatory variable.
To address this possible endogeneity problem, I am trying to employ the 2SLS approach, in which the count endogenous explanatory variable is replaced with its fitted values estimated from a negative binomial first-stage regression. However, I read in Statalist that simply mimicking the standard 2SLS approach in non-linear models may not be the appropriate way to correct for endogeneity. As a result, I decided to employ the control function approach (which is a two-stage residual inclusion (2SRI) approach) proposed by Terza et al. 2008 (Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling), as adjusted by Wooldridge 2014 (Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables).
Specifically, I address the endogeneity problem in my case as follows with Stata commands:
1) In the first stage of 2SRI, a negative binomial regression is used in which the count endogenous variable (w) is regressed on two instruments (z1 and z2) and a set of controls (x1...xn):
nbreg w z1 z2 x1...xn, vce (cluster Firm)
2) Compute the generalized residuals (gr), as suggested by Wooldridge (2014):
predict gr, score
3) In the second stage of 2SRI, the generalized residuals, along with the count endogenous variable, are added to my two outcome models. Recall that y1 is a dummy and y2 is a count:
logit y1 w gr x1...xn, vce (cluster Firm)
nbreg y2 w gr x1...xn, vce (cluster Firm)
According to the above situation that I face in my research, I have two questions:
Q1: Are the procedures and Stata commands described above correct?
Q2: How can I evaluate the relevance and exogenous of my two instruments, z1 and z2? Can I employ the partial Chi-square test for instruments in the first stage to test for relevance? Also, can I employ the standard overidentification test in the non-linear context by regressing the second stage residuals on z1 and z2 and other controls (x1...xn) and multiplying the resulting R2 by 2 (the number of instruments) to get the test statistic?
I apologize for this long post.
Kindly help me answer my two questions. I am looking forward to your helpful insights.
Related Posts with 2SLS in non-linear models with count endogenous explanatory variable
Issues with the dependent variable being too common in Logistic regressions? Good morning/afternoon/evening Ladies and Gents of Statalist, I'm using logistic regression to p…
VAR models on raw or filtered/smoothed data?Dear all, I just started to learn time series analysis and I'm reading Becketti's book at the moment…
You can now play Oregon Trail on Statahttps://twitter.com/mcdroste/status/1321482111350677505 If Stata sales triple in the next few days,…
Marginsplot with two regression equationsDear Members, I have two regression equations. After each regression, I run margins and marginsplot.…
Least Square Method for parametric survival curve fittingDear All, Hope you're doing well and safe! Can anyone let me know Stata codes for parametric survi…
Subscribe to:
Post Comments (Atom)
0 Response to 2SLS in non-linear models with count endogenous explanatory variable
Post a Comment