Dear Stata users,

I am doing a social mobility study in Brazil using the most recent survey of 2014. I am currently using measures based on jobs for the social background and income at the destination. As dependent variable I use income from all sources. The analytical sample has 27620 cases and in 6% of them the income is zero. I have used a Generalized Linear Model (GLM) with a logarithmic link function, as this allows zeros values ​​to be included. In this regard I rely on Nick Cox posts in this forum. Zero income has an important implication for inequality. After all, if children have zero incomes coming from a particular social class, this is significant, and it makes no sense simply to suppress these cases.
I have used the Gamma distribution in GLM. The results make a lot of sense and are very close to those estimated with the Poisson distribution and the Gaussian distribution. However, the use of a GLM Log-Gamma model (or other distribution) for samples with zero income cases is not clear enough for me.
James Hardin & Joseph Hilbe, in Generalized Linear Models and Extensions, already in the fourth edition, Stata Press, say the following about the Gamma distribution:
The gamma model is used for modeling outcomes for which the response can take only values greater than or equal to 0. Used primarily with continuous response data, the GLM gamma family can be used with count data where there are many different count results that in total take the shape of a gamma distribution.
Ideally, the gamma model is best used with positive responses having a constant coefficient of variation. However, the model is robust to wide deviations from the latter criterion”.
On the other hand, considerations such as this can be found on the web:
“Note that while the Gamma distribution (density function) is defined for variate value x=0, the corresponding probability density value is equal to zero. Therefore, the probability of observing a value of 0 is equal to 0, i.e., impossible, and the Gamma distribution cannot be fit to data containing a 0”.
In the analysis of continuous variables, with cases of zero value, using GLM, what would be the appropriate distribution? Gamma distribution would be appropriate?