I found something about weighted medians in Stata/SE. To my knowledge and what I have found at the moment, this is not reported or explained in the forum.
Consider a dataset with 176 observations, and two variables: a monetary variable (y) and a weight (w), normalized to the number of observations. The data is sorted by "y". Example:
Code:
. list in 1/5 +-----------------------+ | y w | |-----------------------| 1. | 153.3 1.2242037 | 2. | 75753.3 1.2242037 | 3. | 92089.306 1.2255392 | 4. | 113553.3 .80866169 | 5. | 119325.52 1.2849769 | +-----------------------+ . . list in 171/176 +-----------------------+ | y w | |-----------------------| 171. | 1008153.3 1.9178511 | 172. | 1050153.3 .8489436 | 173. | 1191875.5 1.2725638 | 174. | 1428153.3 1.039806 | 175. | 1671717 1.3656695 | |-----------------------| 176. | 1932153.3 .74033083 | +-----------------------+
Code:
. sum y [aw=w], d y ------------------------------------------------------------- Percentiles Smallest 1% 75753.3 153.3 5% 133878.4 75753.3 10% 188204.2 92089.31 Obs 176 25% 221473.1 113553.3 Sum of Wgt. 176 50% 338399.3 Mean 405967.1 Largest Std. Dev. 271224.9 75% 504507.2 1191876 90% 714153.3 1428153 Variance 7.36e+10 95% 840153.3 1671717 Skewness 2.291167 99% 1671717 1932153 Kurtosis 10.72572 . di r(p50) 338399.33
Code:
.* Following reference manual . preserve . gen P = (0.5*_N) // defining the cutting point for the 50th percentile . gen W = w if _n == 1 // Defining the cumulative sum of weights (175 missing values generated) . replace W = w[_n] + W[_n-1] if _n > 1 (175 real changes made) . gen index = ( W > P ) // Index for finding "center" of weighted distribution . replace index = index[_n] + index[_n-1] if _n > 1 (88 real changes made) . * Calculating median . gen aux_median = ( y[_n-1] + y[_n] )/2 if index == 1 & W[_n-1] == P (176 missing values generated) . replace aux_median = y if index == 1 & W[_n-1] != P (1 real change made) . replace aux_median = 0 if aux_median == . (175 real changes made) . egen median = max(aux_median) . di median 336153.3 . restore
One hypothesis (but I can't confirm it, as -summarize- is a built-in command) is that this is related with the number of decimals that -summarize- considers when using weights. in fact, cutting arbitrarily in three decimals allow us to achieve the same result that -summarize-.
Code:
. * Following reference manual . preserve . gen P = (0.5*_N) // defining the cutting point for the 50th percentile . gen W = w if _n == 1 // Defining the cumulative sum of weights (175 missing values generated) . replace W = w[_n] + W[_n-1] if _n > 1 (175 real changes made) . replace W = round(W,0.001) // Cutting decimals to 3 (176 real changes made) . gen index = ( W > P ) // Index for finding "center" of weighted distribution . replace index = index[_n] + index[_n-1] if _n > 1 (87 real changes made) . * Calculating median . gen aux_median = ( y[_n-1] + y[_n] )/2 if index == 1 & W[_n-1] == P (175 missing values generated) . replace aux_median = y if index == 1 & W[_n-1] != P (0 real changes made) . replace aux_median = 0 if aux_median == . (175 real changes made) . egen median = max(aux_median) . di median 338399.33 . restore
Kind regards,
David
0 Response to Different results for weighted median using same Stata Manual Methodology [Stata/SE 15.0]
Post a Comment