Hello everyone,

I am currently working on a panel data set where I have to create dummy variables based on firm market value (rmarkval variable). I have to define firms as small or large depending if their unique rmarkval based on idgvkey exceeds their median rmarkval value. If it exceeds then they should be considered as a large firm. I am currently running these commands but it counts every single observation for each idgvkey. How do I do that it sorts out firms based on their unique idgvkey and gives me a unique result (that if the firm is large or not based on their idgvkey)?

Code:
gen large=0
by idgvkey: replace large=1 if rmarkval>mkvalmedian

egen nlarge=sum(large), by(idgvkey)

egen _Unique= sum(large), by(idgvkey)

replace _Unique = . if idgvkey[_n]==idgvkey[_n-1]

by idgvkey: gen largeee=1 if nlarge>(_Unique/2)
egen nlargeee=sum(largeee), by(idgvkey)

drop large nlarge largeee
rename nlargeee large
Array

Thanks for your time!