I am performing nonlinear regression analysis and despite using several books on microeconometrics and count data my econometric knowlegde has reached its limits.
My topic: I want to estimate the causal effect of increased alcohol availability on suicides. I have suicide counts for all counties in one U.S state for several years. My explanatory variable for alcohol availability is the number of on-premise licenses per 1,000 population in each county and year. Further control variables for each county and year are divorce rate, real income per capita, percentage of male population, percentage of white population, percentage of population belonging to age group 15-24, percentage of population belonging to age group 35-44, population density and a categorical variable for the urbanicity level of the county. I also have dummies for county-FE, year-FE, county-specific FE (county x year) and use clustered standard errors. On-premise licenses as proxy for alcohol availability is endogenous, so I plan to use an instrument from a natural experiment which causes exogenous increases in the number of licences for my first stage. I adapted the instrument and parts of my model specification from "Wet laws, drinking Establishments and violent crime by Anderson et al. (2016). Before I include the instrumented on-premise variable, I wanted to discuss the eligible count models for the dependent suicide variable in order to justify why I chose a specific count model for my second stage. After that, I wanted to think about how to combine the IV approach with my count model of choice.
My starting point is a regular Poisson-MLE:
poisson suicides on_premise1_per1000 unemployment realinc_1000 divorcerate per_male per_white per_age1 per_age2 urb2013 pop_density i.year i.county_fips county_FE* [pweight = totalpop], cluster(county_fips)“totalpop” is a variable for the county population in each year.
I then perform the overdispersion test from Cameron and Trivedi which indicates overdispersion. Overdispersion was also evident from descriptive statistics and histograms already.
Therefore, I perform Poisson-quasi-MLE with robust standard errors:
poisson suicides on_premise1_per1000 pop_density unemployment realinc_1000 divorcerate per_male per_white per_age1 per_age2 urb2013 i.year i.county_fips county_FE* [pweight = totalpop], vce(cluster county_fips)First question: Both, Poisson-MLE and Poisson quasi-MLE regressions result in exactly the same outputs, e.g. same coefficients, standard errors and p-values and I am asking myself how this is possible when one regression uses robust standard errors (vce) and the other not or if my commands are wrong.
Because of overdispersion I tried to perform a NegBin-MLE and a NegBin quasi-MLE with robust errors to compare the results and model fit with Poisson quasi-MLE:
nbreg suicides on_premise1_per1000 pop_density unemployment realinc_1000 divorcerate per_male per_white per_age1 per_age2 urb2013 i.year i.county_fips county_FE* [pweight = totalpop], cluster(county_fips)
nbreg suicides on_premise1_per1000 pop_density unemployment realinc_1000 divorcerate per_male per_white per_age1 per_age2 urb2013 i.year i.county_fips county_FE* [pweight = totalpop], vce(cluster county_fips)I had to break the commands because the iterations did not stop. I already figured out that this is caused by the inclusion of the three FE dummies.
Second question: From other threads I partially understood that the nbreg with FE dummies could suffer from incidential parameter problem and inconsistent estimation while xtnbreg does not really consider FE in the right way. One recommendation was to stick with Poisson quasi-MLE and to forgo these two negbin models. Is this right and does this justify to not perform nbreg or xtnbreg apart from that the regressions do not work anyway?
Third question: So far I used pooled regression but does it make sense to consider panel estimators such as xtpoisson in general? I was asking myself the same question in case of zero-inflated or Hurdle models, does it make sense to apply these models in my context?
I appreciate any suggestion or advice you might have, thank you in advance!
0 Response to Problem with specifying count data models
Post a Comment