I have a question regarding which type of regression model is right to use for a zero-inflated distribution.
Some info about the data:
- The dependent variable for one of my hypotheses is ‘distvolatility’ (shown in the table below).
- Its distribution is heavily zero-inflated (1304 out of 1459 observations are 0) and positively skewed. These zeroes are real/true values (not censored/truncated).
- There are 15 possible ‘distvolatility’ scores for respondents with a non-zero value for ‘distvolatility’, ranging from .2959995 to 32.373 (there are no other possible values other than those shown below).
Code:
tab distvolatility
distvolatil |
ity | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,304 89.38 89.38
.2959995 | 10 0.69 90.06
.8690004 | 39 2.67 92.73
4.661 | 7 0.48 93.21
11.673 | 7 0.48 93.69
12.542 | 12 0.82 94.52
14.874 | 17 1.17 95.68
15.17 | 4 0.27 95.96
16.334 | 5 0.34 96.30
17.203 | 11 0.75 97.05
19.535 | 8 0.55 97.60
19.831 | 3 0.21 97.81
31.208 | 9 0.62 98.42
31.504 | 3 0.21 98.63
32.077 | 15 1.03 99.66
32.373 | 5 0.34 100.00
------------+-----------------------------------
Total | 1,459 100.00- Zero-inflated poisson/zero-inflated binomial - These both assume count data. Would it severely bias the results if I were to use one of these forms of regression model (likely zinb as the variance is much higher than the mean), as my data are discrete but not count data?
- Two-step generalised linear model - Another option is to model the probability of distvolatility being 0/1 as a binary logistic regression, and then use a GLM function on the non-zero values.
- Tobit regression - I have also seen this mentioned as an option for zero-inflated distributions, although it assumes the zeroes are censored, which is not the case here.
The zero-inflated negative binomial seems to be the best option at the moment, but any advice would be greatly appreciated!
0 Response to Which regression model to use for zero-inflated distribution?
Post a Comment