Dear Stata users,
I am doing a social mobility study in Brazil using the most recent survey of 2014. I am currently using measures based on jobs for the social background and income at the destination. As dependent variable I use income from all sources. The analytical sample has 27620 cases and in 6% of them the income is zero. I have used a Generalized Linear Model (GLM) with a logarithmic link function, as this allows zeros values to be included. In this regard I rely on Nick Cox posts in this forum. Zero income has an important implication for inequality. After all, if children have zero incomes coming from a particular social class, this is significant, and it makes no sense simply to suppress these cases.
I have used the Gamma distribution in GLM. The results make a lot of sense and are very close to those estimated with the Poisson distribution and the Gaussian distribution. However, the use of a GLM Log-Gamma model (or other distribution) for samples with zero income cases is not clear enough for me.
James Hardin & Joseph Hilbe, in Generalized Linear Models and Extensions, already in the fourth edition, Stata Press, say the following about the Gamma distribution:
“The gamma model is used for modeling outcomes for which the response can take only values greater than or equal to 0. Used primarily with continuous response data, the GLM gamma family can be used with count data where there are many different count results that in total take the shape of a gamma distribution.
Ideally, the gamma model is best used with positive responses having a constant coefficient of variation. However, the model is robust to wide deviations from the latter criterion”.
On the other hand, considerations such as this can be found on the web:
“Note that while the Gamma distribution (density function) is defined for variate value x=0, the corresponding probability density value is equal to zero. Therefore, the probability of observing a value of 0 is equal to 0, i.e., impossible, and the Gamma distribution cannot be fit to data containing a 0”.
In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution? Gamma distribution would be appropriate?
Related Posts with In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution?
covs() not allowed in rdrobstDear statlists, I have a very brief question, but I cannot wrap my head around why this doesn't wor…
Choosing model specification: region*time FEs vs. country-specific time trendsHello! I am trying to find the effect of drought on agricultural production using a model of the fo…
Treatment , Post Treat in case of heterogeneous treatment timingHello all, Would like to create an event study but I'm trying to make sure I'm creating treatment, …
Vectorized divisionHi All, I have the following dataset. test2=sums_of_squares_weight/total_passenger_perday_adjusted …
Difference in Difference estimation on carbon emissions in STATA 17.Dear STATA community, I have some questions regarding my analysis for my DID estimation. I am looki…
Subscribe to:
Post Comments (Atom)
0 Response to In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution?
Post a Comment