I have a question regarding which type of regression model is right to use for a zero-inflated distribution.
Some info about the data:
- The dependent variable for one of my hypotheses is ‘distvolatility’ (shown in the table below).
- Its distribution is heavily zero-inflated (1304 out of 1459 observations are 0) and positively skewed. These zeroes are real/true values (not censored/truncated).
- There are 15 possible ‘distvolatility’ scores for respondents with a non-zero value for ‘distvolatility’, ranging from .2959995 to 32.373 (there are no other possible values other than those shown below).
Code:
tab distvolatility distvolatil | ity | Freq. Percent Cum. ------------+----------------------------------- 0 | 1,304 89.38 89.38 .2959995 | 10 0.69 90.06 .8690004 | 39 2.67 92.73 4.661 | 7 0.48 93.21 11.673 | 7 0.48 93.69 12.542 | 12 0.82 94.52 14.874 | 17 1.17 95.68 15.17 | 4 0.27 95.96 16.334 | 5 0.34 96.30 17.203 | 11 0.75 97.05 19.535 | 8 0.55 97.60 19.831 | 3 0.21 97.81 31.208 | 9 0.62 98.42 31.504 | 3 0.21 98.63 32.077 | 15 1.03 99.66 32.373 | 5 0.34 100.00 ------------+----------------------------------- Total | 1,459 100.00
- Zero-inflated poisson/zero-inflated binomial - These both assume count data. Would it severely bias the results if I were to use one of these forms of regression model (likely zinb as the variance is much higher than the mean), as my data are discrete but not count data?
- Two-step generalised linear model - Another option is to model the probability of distvolatility being 0/1 as a binary logistic regression, and then use a GLM function on the non-zero values.
- Tobit regression - I have also seen this mentioned as an option for zero-inflated distributions, although it assumes the zeroes are censored, which is not the case here.
The zero-inflated negative binomial seems to be the best option at the moment, but any advice would be greatly appreciated!
0 Response to Which regression model to use for zero-inflated distribution?
Post a Comment