Hello everyone,

I am currently working with Panel Data (firm, year) for my seminar paper and I am in the phase of preparing the data for the analysis. My problem now is as followed:

I generated the variable CETR (Cash Effective Tax Rate) with the command
gen CETR = CF_TAXATION / PRETAX_INCOME

The results included some negative values, some values larger than 1 and also missing values.

Now, in an effort to control for outliers I wanted to winsorize the values for CETR to 0 and 1, i.e. if CETR has a value >1 it should be defined as 1 and if CETR<0 it should be defined as 0.
replace CETR=0 if CETR<0
replace CETR=1 if CETR>1

After looking at the results, I observed that Stata now assigned the value 1 to originally missing data of CETR, because Stata treats missing values as positive infinity. Since I have a significant amount of missing data this biases my results substantially. So my question is therefore, how do I have to alter the previous commands to prevent such a biased result or i.e. how do I tell Stata to keep missing values missing in such a setting?

Thanks in advance and kind regards,

Lucas