I use Stata 13 and I want to calculate average value of X removing outliers (1st and 99th percentile) from different datasets. However, when there are many values of X equal to the percentile value, I want to drop the observations based on value of other variables ( for instance, the smallest values of variable Z). Is there a command that allows me to tag outliers this way?
I have created a program (see code below) that creates an outlier dummy accordingly, but it is not efficient since it takes long and I have to repeat the procedure many times. Would you know of a more efficient way to perform what I want? Thank you so much.
Code:
program define trim_criteria qui centile `1', centile(1 99) scalar p1= r(c_1) scalar p99= r(c_2) scalar tot= r(N) scalar tot1 = round(tot*0.01) gsort -v elasticity_sign -imp_v_share -exp_v_share imp_iso3 aff_iso3 // I sort the data according to these six variables *dummy1% qui gen d1_`1'=1 if float(`1')>float(p99) & `1'!=. qui count if float(p99)==float(`1') if r(N)>0 { qui count if d1_`1'==1 qui scalar j1= tot1-r(N) qui gen f1 = _n if float(p99)==float(`1') qui egen g1=rank (f1), field qui replace d1_`1' = 1 if g1<=j1 } qui count if d1_`1'==1 cap assert r(N)==tot1 if _rc!=0 { assert r(N)==tot1-1 } qui replace d1_`1'=1 if float(`1')<float(p1) & `1'!=. qui count if float(p1)==float(`1') if r(N)>0 { qui count if d1_`1'==1 qui scalar j2= (2*tot1)-r(N) qui gen f2 = _n if float(p1)==float(`1') & d1_`1'!=1 qui egen g2=rank (f2), field qui replace d1_`1' = 1 if g2<=j2 } qui count if d1_`1'==1 cap assert r(N)==2*tot1 if _rc!=0 { cap assert r(N)==2*tot1-1 if _rc!=0 { assert r(N)==2*tot1-2 } } foreach var in f1 g1 f2 g2{ cap drop `var' end }
0 Response to Droping outliers based on different criteria
Post a Comment