(1) Q(p) = F^(-1)(p) = inf{x: F(x)>=p}, 0<p<1,
Where F(x) is the cumulative distribution function and F^(-1)(p) is the inverse cumulative distribution function.
e.g., the very first definition of Rob J. Hyndman & Yanan Fan (1996) Sample Quantiles in Statistical Packages, The American Statistician, 50:4, 361-365.
The manual of -xtile- Methods and Formulas p. 584 described some algorithm but does not give a reference to a textbook in statistics, or an article in statistics which derives this algorithm, or explains why this algorithm makes sense.
And the Stata algorithm does not agree (as far as I can see) with the definition above from Hyndman & Fan (1996, p.361).
Take this example here:
Code:
. sysuse auto, clear
(1978 Automobile Data)
. keep price
. keep in 1/20
(54 observations deleted)
. sort price
. cumul price, gen(cumprice)
. list, sep(5)
+-------------------+
| price cumprice |
|-------------------|
1. | 3,299 .05 |
2. | 3,667 .1 |
3. | 3,799 .15 |
4. | 3,955 .2 |
5. | 3,984 .25 |
|-------------------|
6. | 4,082 .3 |
7. | 4,099 .35 |
8. | 4,453 .4 |
9. | 4,504 .45 |
10. | 4,749 .5 |
|-------------------|
11. | 4,816 .55 |
12. | 5,104 .6 |
13. | 5,189 .65 |
14. | 5,705 .7 |
15. | 5,788 .75 |
|-------------------|
16. | 7,827 .8 |
17. | 10,372 .85 |
18. | 11,385 .9 |
19. | 14,500 .95 |
20. | 15,906 1 |
+-------------------+Code:
. _pctile price, perc(25 50 75)
. return list
scalars:
r(r1) = 4033
r(r2) = 4782.5
r(r3) = 6807.5Code:
. dis (3984+4082)/2 4033 . dis (4749+4816)/2 4782.5 . dis (5788+7827)/2 6807.5
0 Response to Why is Stata calculating percentiles in the way it does, and who has said that this is the way to calculate percentiles?
Post a Comment