Dear Stata users,
I am doing a social mobility study in Brazil using the most recent survey of 2014. I am currently using measures based on jobs for the social background and income at the destination. As dependent variable I use income from all sources. The analytical sample has 27620 cases and in 6% of them the income is zero. I have used a Generalized Linear Model (GLM) with a logarithmic link function, as this allows zeros values to be included. In this regard I rely on Nick Cox posts in this forum. Zero income has an important implication for inequality. After all, if children have zero incomes coming from a particular social class, this is significant, and it makes no sense simply to suppress these cases.
I have used the Gamma distribution in GLM. The results make a lot of sense and are very close to those estimated with the Poisson distribution and the Gaussian distribution. However, the use of a GLM Log-Gamma model (or other distribution) for samples with zero income cases is not clear enough for me.
James Hardin & Joseph Hilbe, in Generalized Linear Models and Extensions, already in the fourth edition, Stata Press, say the following about the Gamma distribution:
“The gamma model is used for modeling outcomes for which the response can take only values greater than or equal to 0. Used primarily with continuous response data, the GLM gamma family can be used with count data where there are many different count results that in total take the shape of a gamma distribution.
Ideally, the gamma model is best used with positive responses having a constant coefficient of variation. However, the model is robust to wide deviations from the latter criterion”.
On the other hand, considerations such as this can be found on the web:
“Note that while the Gamma distribution (density function) is defined for variate value x=0, the corresponding probability density value is equal to zero. Therefore, the probability of observing a value of 0 is equal to 0, i.e., impossible, and the Gamma distribution cannot be fit to data containing a 0”.
In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution? Gamma distribution would be appropriate?
Related Posts with In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution?
Short Panel data + fixed effect model, questions about time dummy and unit rootHi dear Statalist, I’m exploring how the bank specific characteristics and macro variables (liquidit…
mlogit with pooled data (not mlogit for panel model)I have panel data for 40 individuals and their characteristics for 10 years. The DV is an outcome v…
Creating a Sample from two categorical variables. Looking for replication.Hello Statalist, This is my second post, sorry in advance if I broke a rule. I have a data that co…
Very different p-values using testnl vs test after marginsI'm working with survey data on medication use among people with a disease. The question is whether …
How to create a date variable from month? String VariableHi I have a query my variable is in the following format: 1901m12, 1902m1 and so on. I want to separ…
Subscribe to:
Post Comments (Atom)
0 Response to In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution?
Post a Comment