I have a problem in my research project where my dependent variable is a (very dispersed) count variable, I have many covariates and there is one key endogenous variable that is binary. I was hoping that someone with experience with this type of models can help me out.
I have researched potential solutions, in particular Wooldridge 2014 ("Control Functions in Applied Econometrics") and Wooldridge 2015 ("Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables"). I also read the very helpful thread
Overall, there appears to be no "silver bullet" solution. At the end of the day, all models are incorrect, but I am trying to do the best that I can and find the ones that appear more sensible.
What I have done so far is:
- Winsorize all count variables (to allay dispersion) and simply run IV 2SLS
- Run the stata user-command "ivpois" which assumes an exponential conditional mean. However, since my standard errors are clustered, I have to bootstrap which is taking an awfully long time. The fact that the endogenous variable is binary is no issue here, correct?
- Control Function approach: include residuals from the first stage, which I estimate by OLS, and include into a second stage that is either Poisson or Negative Binomial. Again, bootstrapped clustered standard error for inference. If I would like to present results from the Negative Binomial model, is this the best that I can do? Since I have a binary endogenous variable, this approach might be strictly speaking wrong (but then again, all models have issues).
- I have started to think about making restrictive assumptions on the structural errors that pertain to the outcome-equation and the equation for the endogenous variable to arrive at a Log-Likelihood Function that I can maximize. However, the Poisson Assumption (maybe I should use a different assumption here?) on the count variable makes it difficult to arrive at an analytic expression for the likelihood - do you have recommendations where to look here? In the worst case, I may to have to simulate or numerically integrate probabilities - what packages would you recommend here?
Thanks for your help!
Related Posts with Non-linear IV: Dependent count variable and binary endogenous variable
Get words from a list if they contain a given substringI have a set of filenames in a list like Code: local files "a.dat b.dat c.dat d.xlsx e.dat f.xlsx"…
Diff-in-diff and parallel trend assumptionThis question is not strictly related to Stata, I will remove if it is inappropriate. In a diff-in-…
Fixed effects in -xtpcse-Hello everyone, I'm running an analysis with TSCS data and test statistics reveal the presence of a…
Difference in Difference SignificanceHi all, I'm trying to figure out whether there has been a change in the ratio of liquid assets to t…
Unified preference configuration file across nodes on unix systemI'm using Stata 16 MP GUI with a network license on RHEL. I am connected to that system using MobaXt…
Subscribe to:
Post Comments (Atom)
0 Response to Non-linear IV: Dependent count variable and binary endogenous variable
Post a Comment