Dear Stata users,
I am doing a social mobility study in Brazil using the most recent survey of 2014. I am currently using measures based on jobs for the social background and income at the destination. As dependent variable I use income from all sources. The analytical sample has 27620 cases and in 6% of them the income is zero. I have used a Generalized Linear Model (GLM) with a logarithmic link function, as this allows zeros values to be included. In this regard I rely on Nick Cox posts in this forum. Zero income has an important implication for inequality. After all, if children have zero incomes coming from a particular social class, this is significant, and it makes no sense simply to suppress these cases.
I have used the Gamma distribution in GLM. The results make a lot of sense and are very close to those estimated with the Poisson distribution and the Gaussian distribution. However, the use of a GLM Log-Gamma model (or other distribution) for samples with zero income cases is not clear enough for me.
James Hardin & Joseph Hilbe, in Generalized Linear Models and Extensions, already in the fourth edition, Stata Press, say the following about the Gamma distribution:
“The gamma model is used for modeling outcomes for which the response can take only values greater than or equal to 0. Used primarily with continuous response data, the GLM gamma family can be used with count data where there are many different count results that in total take the shape of a gamma distribution.
Ideally, the gamma model is best used with positive responses having a constant coefficient of variation. However, the model is robust to wide deviations from the latter criterion”.
On the other hand, considerations such as this can be found on the web:
“Note that while the Gamma distribution (density function) is defined for variate value x=0, the corresponding probability density value is equal to zero. Therefore, the probability of observing a value of 0 is equal to 0, i.e., impossible, and the Gamma distribution cannot be fit to data containing a 0”.
In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution? Gamma distribution would be appropriate?
Related Posts with In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution?
Variable changes_Telling stata to consider only 2 digits after decimal var3 0.001 0.0005 0.00006 0.01 My question is very elementary or trivial, but as I …
Model test for OLS conditions and if statementHello, I am working with a multiple linear regression model. I have to test wether my model meets t…
Translation, string date recognized as the week dateHi, I want to get some advice. I have a date variable that represents week. For example, 2014023, is…
Foreach regression loop - Stata 16Hi, I am trying the 'foreach' command for the first time and cannot figure out the problem in my cod…
Forced variables dropped when using lassoDear Stata Users, I am running a regression through the lasso command in Stata 16. My aim is to use…
Subscribe to:
Post Comments (Atom)
0 Response to In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution?
Post a Comment