Hi everyone,

I have a data set with different companies from 2002 till 2018, with the percentage of unionization per year per firm. I tried to create two dummy variables for the unionization variable(union_mem_t1). One, which is 1 when the observation is in the highest quantile of the unionization variable and one that equals 1, when it is in the lowest quantile.
I used the code below, which seems to work but the observations allocated to the yearly high and low quantile change every time I rund the do file.

bysort year (union_mem_t1): gen byte union_mem_dummy_quanhigh_t1 = _n > (0.75 * _N)
bysort year (union_mem_t1): gen byte union_mem_dummy_quanlow_t1 = _n < (0.25 * _N)

I then tried

xtile union_mem_dummy_quan_t1 = union_mem_t1, n(4)

gen union_mem_dummy_quanhigh_t1= union_mem_dummy_quan_t1
replace union_mem_dummy_quanhigh_t1 =0 if union_mem_dummy_quanhigh_t1!=4
replace union_mem_dummy_quanhigh_t1 =1 if union_mem_dummy_quanhigh_t1==4

gen union_mem_dummy_quanlow_t1= union_mem_dummy_quan_t1
replace union_mem_dummy_quanlow_t1 =0 if union_mem_dummy_quanlow_t1!=1
replace union_mem_dummy_quanlow_t1 =1 if union_mem_dummy_quanlow_t1==1

to see if I have the same problem when I am creating the dummies over all years and here the allocation is constantly the same.

Could someone please kindly help me to understand this? I would be really grateful.


Thank you in advance