Suppose I plot a histogram:
Code:
clear set obs 10 g z = _n replace z = 5 if _n > 5 hist z
- bin, giving the bin which a given observation belongs to.
- density, giving the density of the bin which the observation belongs to.
Code:
g correct_bin = 1 if inrange(_n, 1, 2) replace correct_bin = 2 if _n == 3 replace correct_bin = 3 if _n >= 4 g correct_density = 0.15 if inrange(_n, 1, 2) replace correct_density = 0.075 if _n == 3 replace correct_density = 0.525 if _n >= 4
- Bins aren't just beside each other, that is for example bin 1 = [1,2) and bin 2 = [5,6)
- Or even just when the sample size grows and the values of the z variable is continuous, then numerical issues quickly arise
- Use twoway__histogram_gen to find the midpoint of each bin.
- Adjust the midpoints to be the start points of the bins.
- Create new variables x_v which are constant to the start point of bin v.
- Check which interval [x_v, x_{v+1}) each observation belongs to.a
- Find the corresponding density of that bin.
Code:
* 1, finding midpoints twoway__histogram_gen z, gen(y x) * 2, adjusting midpoints to start points local adjust = (x[2] - x[1]) / 2 replace x = x - `adjust' * 3, generating variables constant to startpoints count if x != . local N = r(N) forvalues v = 1/`=`N'+1' { g x_`v' = x[`v'] } * 4, finding bin of each observation g new_bin = . forvalues v = 1/`N' { replace new_bin = `v' if x_`v' <= z & z < x_`=`v'+1' } * 5, finding density of the bin g new_density = . forvalues v = 1/`N' { replace new_density = y[`v'] if new_bin == `v' }
Code:
assert correct_bin == new_bin assert correct_density == new_density
Thanks in advance for help!
Simon
aI'm not sure that the histogram bins of Stata are indeed of the form [x_v, x_{v+1}) (i.e., closed-open) but my investigations indicate this is true (the final bin having endpoint infinity).
0 Response to Obtain which bin and corresponding density each observation in plotted histogram belongs to?
Post a Comment