Hello,

I am attempting to create a frequency histogram of a variable (called M) with a logarithmic x- and y-axis in Stata 16. The number of observations is 2.1 mio.

My original data are highly unequally distributed with 99% having a value of 0:
Array


Command for this was:
Code:
histogram M, frequency ylabel(0.000002 20 200 2000, angle(horizontal) grid glpattern(solid) gextend) xtitle(M)
To logarithmise the x-axis is apparently none of a problem, I used gen lM = log(M) to create my desired variable in log scale.
Afterwards, my distribution looks like that: Array


Command for this:
Code:
histogram lM, frequency ylabel(0.000002 20 200 2000, angle(horizontal) grid glpattern(solid) gextend) xtitle(lM)
The y-axis also changes, but that is most probably just because all the values where M=0 drop out after logarithmising them.
Now I would like to also logarithmise the y-axis. I tried to simply use the yscale(log) option:

Code:
. histogram lM, frequency yscale(log) ylabel(0.000002 20 200 2000, angle(horizontal) grid glpattern(solid) gextend)
(bin=43, start=1.0986123, width=.27069641)
Array


However, the result is not as I expected. As you can see, the y-axis is behaving quite strangely, all the relevant ticks (20, 200, 2000) are basically on the same line at the top. The y-axis value of 0.000002, which I include for illustration, is just slightly below the others. What I wanted to create should look rather like this graph (Source: Yasseri, T., Sumi, R., Rung, A., Kornai, A., & Kertész, J. (2012). Dynamics of conflicts in Wikipedia. PloS one, 7(6), e38869):
Array

Here the relevant ticks (10, 100, 100, 10000) are distributed evenly over the y-axis.
Is it due to my data (and understanding of them) or did I just not manage to enter the right command to produce the desired outcome?

PS: I tried to make the graphs smaller in this post, but for some reason it didn't work, sorry for that.
PSS: For some reason the images are all being posted again at the end of my post. How do I avoid that without removing them entirely?