I am dealing with a dataset with monthly information on the different hospitals each health professional works as well as the particular occupation code. In other words, each row corresponds to: month - hospital id - health worker id - occupation code (the corresponding variable names are ym, hosp, cpf_prof, and occup). A given worker can work in different hospitals and have different occupations in the same hospital at a given month.

The health professional individual ID has been ecrypted due to data privacy issues. Fortunately it still allows us to differenciate between health professionals as they have been encryped in a way that kept same sequence for a same health professional over time and different sequence across different health professionals.

I ran to code below to create a variable that tags the first observation for every month - personal ID but am encountering difficulties. I don't understand why Stata does recognise row # 3 as a new observation given that the variable cpf_prof is different. It seems like later in the dataset (row # 27) it starts recognising this encrypted value as a new one, but it doesnt before. This makes it all even stranger.
Code:
bys ym cpf_prof: gen d_unic = 1 if _n==1
order ym cpf_prof d_unic
Array

PS: I could not use dataex to generate a sample of my dataset because when coping the code displayed in the screen part of the encrypted sequence disappears.