Dear Statalists,

I hope you are well. I would like to ask you please about the process of using the code winsor2 to clean the dataset from the outlier issue. I have tried with the following steps with a number of variables but the variables have not changed- as shown in the examples.

Example (1)

​​​​​​clonevar PO_ST_W = PO_GEN
su PO_GEN_W , d
winsor2 P_GEN_W , replace cuts(1 99)
replace P_GEN_W =r(p99) if PO_GEN_W >=r(p99) & PO_GEN_W <.
replace P_GEN_W =r(p1) if PO_GEN_W >=r(p1) & PO_GEN_W <.

. replace PO_GEN_W =r(p1) if PO_GEN_W >=r(p1) & PO_GEN_W >.
(0 real changes made)

. replace PO_GEN_W =r(p99) if PO_GEN_W >=r(p99) & PO_GEN_W <.
(0 real changes made)

Example (2)

​​​​​​clonevar PO_ST_W = PO_GEN
su R_ST_W , d
winsor2 R_ST_W , replace cuts(1 99)
replace R_ST_W =r(p99) if R_ST_W >=r(p99) & R_ST_W <.
replace R_ST_W =r(p1) if R_ST_W >=r(p1) & R_ST_W <.

. replace R_ST_W =r(p1) if R_ST_W >=r(p1) & R_ST_W >.
(0 real changes made)

. replace R_ST_W =r(p99) if R_ST_W >=r(p99) & R_ST_W <.
(0 real changes made)


su R_ST_W, d

Level of satisfaction

Percentiles Smallest
1% 0 0
5% 0 0
10% .5 0 Obs 300
25% 1.5 0 Sum of Wgt. 300

50% 2 Mean 1.65
Largest Std. Dev. .6549273
75% 2 2
90% 2 2 Variance .4289298
95% 2 2 Skewness -1.63945
99% 2 2 Kurtosis 4.263773




I have attached here a sample of a graph box that shows the existence of the outlier in one of the variables.


probit Sksupprt i.FST_EXP i.FST_B i.FST_GW i.FST_AD i.FST_ADV i.R_LN i.R_ST_W i.PO_GEN i.PO_CIT i.PO_EP i.PO_EC i.FA_SE i.FA_AE i.FA_SI

My variables are dummy and categorical variables coded the former as01 and the later start wit 0, 1, 2, ... for 300 observations.


Could you please help on how to apply winsorize2 for the variables that have outliers? and why I am getting no changes made a result?


Many thanks for your continuous help

Kind Regards,
Rabab