Stata has several commands that compute percentiles:
centile
sum, d
_pctile
egen pctile
and perhaps others.
It turns out that these do not always yield the same results, apart from the median or 50th percentile. For example this code:
Code:
preserve cap drop _all set obs 20 set seed 23 tempvar y gen `y'=exp(rnormal(0,1)) qui centile `y', c(10 25 50 75 90) di r(c_1) _n r(c_2) _n r(c_3) _n r(c_4) _n r(c_5) qui sum `y',d di r(p10) _n r(p25) _n r(p50) _n r(p75) _n r(p90) qui _pctile `y', p(10 25 50 75 90) di r(r1) _n r(r2) _n r(r3) _n r(r4) _n r(r5) drop _all restore
Code:
. preserve . cap drop _all . set obs 20 number of observations (_N) was 0, now 20 . set seed 23 . tempvar y . gen `y'=exp(rnormal(0,1)) . qui centile `y', c(10 25 50 75 90) . di r(c_1) _n r(c_2) _n r(c_3) _n r(c_4) _n r(c_5) .29993572 .38304436 1.6890243 2.8531529 5.1466236 . qui sum `y',d . di r(p10) _n r(p25) _n r(p50) _n r(p75) _n r(p90) .31345257 .40814352 1.6890243 2.7669318 5.0989532 . qui _pctile `y', p(10 25 50 75 90) . di r(r1) _n r(r2) _n r(r3) _n r(r4) _n r(r5) .31345257 .40814352 1.6890243 2.7669318 5.0989532 . drop _all . restore . end of do-file
Yet the differences may be nontrivial in some contexts (e.g. computation of IQRs), so it is perhaps worth considering which of the competing formulae squares most closely with how the researcher conceives of percentiles.
0 Response to Computing percentiles
Post a Comment