I found something about weighted medians in Stata/SE. To my knowledge and what I have found at the moment, this is not reported or explained in the forum.
Consider a dataset with 176 observations, and two variables: a monetary variable (y) and a weight (w), normalized to the number of observations. The data is sorted by "y". Example:
Code:
. list in 1/5
     +-----------------------+
     |         y           w |
     |-----------------------|
  1. |     153.3   1.2242037 |
  2. |   75753.3   1.2242037 |
  3. | 92089.306   1.2255392 |
  4. |  113553.3   .80866169 |
  5. | 119325.52   1.2849769 |
     +-----------------------+
. 
. list in 171/176
     +-----------------------+
     |         y           w |
     |-----------------------|
171. | 1008153.3   1.9178511 |
172. | 1050153.3    .8489436 |
173. | 1191875.5   1.2725638 |
174. | 1428153.3    1.039806 |
175. |   1671717   1.3656695 |
     |-----------------------|
176. | 1932153.3   .74033083 |
     +-----------------------+Code:
. sum y [aw=w], d
                              y
-------------------------------------------------------------
      Percentiles      Smallest
 1%      75753.3          153.3
 5%     133878.4        75753.3
10%     188204.2       92089.31       Obs                 176
25%     221473.1       113553.3       Sum of Wgt.         176
50%     338399.3                      Mean           405967.1
                        Largest       Std. Dev.      271224.9
75%     504507.2        1191876
90%     714153.3        1428153       Variance       7.36e+10
95%     840153.3        1671717       Skewness       2.291167
99%      1671717        1932153       Kurtosis       10.72572
. di r(p50)
338399.33Code:
.* Following reference manual . preserve . gen P = (0.5*_N) // defining the cutting point for the 50th percentile . gen W = w if _n == 1 // Defining the cumulative sum of weights (175 missing values generated) . replace W = w[_n] + W[_n-1] if _n > 1 (175 real changes made) . gen index = ( W > P ) // Index for finding "center" of weighted distribution . replace index = index[_n] + index[_n-1] if _n > 1 (88 real changes made) . * Calculating median . gen aux_median = ( y[_n-1] + y[_n] )/2 if index == 1 & W[_n-1] == P (176 missing values generated) . replace aux_median = y if index == 1 & W[_n-1] != P (1 real change made) . replace aux_median = 0 if aux_median == . (175 real changes made) . egen median = max(aux_median) . di median 336153.3 . restore
One hypothesis (but I can't confirm it, as -summarize- is a built-in command) is that this is related with the number of decimals that -summarize- considers when using weights. in fact, cutting arbitrarily in three decimals allow us to achieve the same result that -summarize-.
Code:
. * Following reference manual . preserve . gen P = (0.5*_N) // defining the cutting point for the 50th percentile . gen W = w if _n == 1 // Defining the cumulative sum of weights (175 missing values generated) . replace W = w[_n] + W[_n-1] if _n > 1 (175 real changes made) . replace W = round(W,0.001) // Cutting decimals to 3 (176 real changes made) . gen index = ( W > P ) // Index for finding "center" of weighted distribution . replace index = index[_n] + index[_n-1] if _n > 1 (87 real changes made) . * Calculating median . gen aux_median = ( y[_n-1] + y[_n] )/2 if index == 1 & W[_n-1] == P (175 missing values generated) . replace aux_median = y if index == 1 & W[_n-1] != P (0 real changes made) . replace aux_median = 0 if aux_median == . (175 real changes made) . egen median = max(aux_median) . di median 338399.33 . restore
Kind regards,
David
0 Response to Different results for weighted median using same Stata Manual Methodology [Stata/SE 15.0]
Post a Comment