Dear all,

I am working on the register-based cancer study. For the study, I would like to select one primary cancer diagnosis code (earliest diagnosis code) for each patient.
However, some patients have more than one diagnosis code at the first date of diagnosis. eg. id 2 has 3 different diagnosis codes (C186,C187,C209) at the first diagnosis date and C205 at the later date.
So, for this case, I would like to randomly select one earliest diagnosis code.
(If random selection gets any code ended with 9, which stands for unspecified cancer location, I would like to select the corresponding code at the later date. For id 2, if random selection gets C209 (unspecified rectal cancer), select C205 (specified rectal cancer) in later date. But this step will be done only if it is straight forward to do so. If not, I will keep C209 as randomly selected)

In brief, to select one earliest diagnosis code for each patient. If there are different diagnosis codes on the first diagnosis, randomly select one code.

I would like to learn how I can perform this case selection in Stata. I am using version 16.

Kind regards,
Moon Lu

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id int DIADAT str4 ICDO10
1 18536 "C183"
1 18536 "C186"
1 18536 "C187"
2 18875 "C186"
2 18875 "C187"
2 18875 "C209"
2 18986 "C205"
3 17328 "C182"
3 17328 "C183"
3 17328 "C184"
4 17604 "C185"
4 17604 "C186"
4 17608 "C180"
5 20471 "C180"
5 20472 "C182"
5 20473 "C183"
6 20472 "C184"
7 20436 "C180"
8 20436 "C182"
end
format %tdnn/dd/CCYY DIADAT