Hallo

I have a problem in my research project where my dependent variable is a (very dispersed) count variable, I have many covariates and there is one key endogenous variable that is binary. I was hoping that someone with experience with this type of models can help me out.

I have researched potential solutions, in particular Wooldridge 2014 ("Control Functions in Applied Econometrics") and Wooldridge 2015 ("Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables"). I also read the very helpful thread
https://www.statalist.org/forums/for...ative-binomial

Overall, there appears to be no "silver bullet" solution. At the end of the day, all models are incorrect, but I am trying to do the best that I can and find the ones that appear more sensible.

What I have done so far is:

- Winsorize all count variables (to allay dispersion) and simply run IV 2SLS

- Run the stata user-command "ivpois" which assumes an exponential conditional mean. However, since my standard errors are clustered, I have to bootstrap which is taking an awfully long time. The fact that the endogenous variable is binary is no issue here, correct?

- Control Function approach: include residuals from the first stage, which I estimate by OLS, and include into a second stage that is either Poisson or Negative Binomial. Again, bootstrapped clustered standard error for inference. If I would like to present results from the Negative Binomial model, is this the best that I can do? Since I have a binary endogenous variable, this approach might be strictly speaking wrong (but then again, all models have issues).

- I have started to think about making restrictive assumptions on the structural errors that pertain to the outcome-equation and the equation for the endogenous variable to arrive at a Log-Likelihood Function that I can maximize. However, the Poisson Assumption (maybe I should use a different assumption here?) on the count variable makes it difficult to arrive at an analytic expression for the likelihood - do you have recommendations where to look here? In the worst case, I may to have to simulate or numerically integrate probabilities - what packages would you recommend here?

Thanks for your help!