I would like to calculate the leave-out weighted median of a variable within groups. The following code illustrates my problem, calculating the median sale price for all other foreign or domestic cars, for each make of car. It is built on
this FAQ answer and
this answer to a previous question on Statalist:
Code:
sysuse auto, clear
gen sales = floor(uniform()*100) // create artificial weight variable
capture drop leave_out_med_sales
gen leave_out_med_sales = .
capture drop temp
tempvar temp
forvalues i = 1/`=_N' {
qui gen temp = price
qui replace temp = . if _n == `i'
qui su price [w = sales] if foreign == foreign[`i'], detail
qui replace leave_out_med_sales = r(p50) if _n == `i'
drop temp
}
The problem is that this code is much too slow. I have a dataset of around 20 million observations, so this calculation needs to be done much more quickly. Is there a way I can vectorise this operation, or at least speed it up dramatically?
0 Response to efficiently calculating leave-out median of a variable by groups with weights
Post a Comment