(1) Q(p) = F^(-1)(p) = inf{x: F(x)>=p}, 0<p<1,
Where F(x) is the cumulative distribution function and F^(-1)(p) is the inverse cumulative distribution function.
e.g., the very first definition of Rob J. Hyndman & Yanan Fan (1996) Sample Quantiles in Statistical Packages, The American Statistician, 50:4, 361-365.
The manual of -xtile- Methods and Formulas p. 584 described some algorithm but does not give a reference to a textbook in statistics, or an article in statistics which derives this algorithm, or explains why this algorithm makes sense.
And the Stata algorithm does not agree (as far as I can see) with the definition above from Hyndman & Fan (1996, p.361).
Take this example here:
Code:
. sysuse auto, clear (1978 Automobile Data) . keep price . keep in 1/20 (54 observations deleted) . sort price . cumul price, gen(cumprice) . list, sep(5) +-------------------+ | price cumprice | |-------------------| 1. | 3,299 .05 | 2. | 3,667 .1 | 3. | 3,799 .15 | 4. | 3,955 .2 | 5. | 3,984 .25 | |-------------------| 6. | 4,082 .3 | 7. | 4,099 .35 | 8. | 4,453 .4 | 9. | 4,504 .45 | 10. | 4,749 .5 | |-------------------| 11. | 4,816 .55 | 12. | 5,104 .6 | 13. | 5,189 .65 | 14. | 5,705 .7 | 15. | 5,788 .75 | |-------------------| 16. | 7,827 .8 | 17. | 10,372 .85 | 18. | 11,385 .9 | 19. | 14,500 .95 | 20. | 15,906 1 | +-------------------+
Code:
. _pctile price, perc(25 50 75) . return list scalars: r(r1) = 4033 r(r2) = 4782.5 r(r3) = 6807.5
Code:
. dis (3984+4082)/2 4033 . dis (4749+4816)/2 4782.5 . dis (5788+7827)/2 6807.5
0 Response to Why is Stata calculating percentiles in the way it does, and who has said that this is the way to calculate percentiles?
Post a Comment