Goodmorning,
I am a bit stuck for my analysis and I am not sure on how to proceed...I am using Stata 15 on Mac and I have a db of patients (800 patients and 14000 observations). Each has an ID, but most of the patients have more than one line (each line represents a visit, and each patient has several visits, but not all patients have the same number of visits. Each visit has a date).

Each patient has his/her disease (one or more), however the db is made so that the disease is repeated until a new diseases is added. I added a variable n_dis_n to say how many different diseases the patient has. For example:

Code:
clear
input float ID int(consult_d d_1consult) float(disease_n n_dis_n)
110420 20825 20825 40 1
110420 20937 20825 40 1
110426 20863 20863 30 2
110426 20912 20863 30 2
110426 20989 20863 30 2
110426 21010 20863 47 2
110447 20832 20832 40 3
110447 20832 20832 40 3
110447 20860 20832 40 3
110447 20870 20832 40 3
110447 20895 20832 40 3
110447 20899 20832 40 3
110447 20909 20832 40 3
110447 20935 20832 40 3
110447 20941 20832 40 3
110447 20958 20832 40 3
110447 20970 20832 40 3
110447 20982 20832 40 3
110447 20993 20832 40 3
110447 21019 20832 40 3
110447 21077 20832 40 3
110447 21103 20832 40 3
110447 21168 20832 40 3
110447 21189 20832 40 3
110447 21213 20832 40 3
110447 21215 20832 40 3
110447 21238 20832 40 3
110447 21241 20832 40 3
110447 21257 20832 40 3
110447 21262 20832 40 3
110447 21269 20832 40 3
110447 21272 20832 40 3
110447 21272 20832 40 3
110447 21311 20832 40 3
110447 21311 20832 40 3
110447 21342 20832 40 3
110447 21342 20832 40 3
110447 21342 20832 40 3
110447 21353 20832 40 3
110447 21353 20832 40 3
110447 21427 20832 40 3
110447 21430 20832 40 3
110447 21430 20832 40 3
110447 21472 20832 40 3
110447 21514 20832 40 3
110447 21556 20832 40 3
110447 21584 20832 40 3
110447 21584 20832 40 3
110447 21612 20832 40 3
110447 21626 20832 40 3
110447 21654 20832 30 3
110447 21682 20832 40 3
110447 21717 20832 29 3
110447 21752 20832 40 3
110447 21783 20832 40 3
110447 21818 20832 40 3
110447 21861 20832 40 3
110447 21882 20832 40 3
110447 21924 20832 40 3
110447 21938 20832 40 3
110447 21938 20832 40 3
110447 21973 20832 40 3
110447 21973 20832 40 3
110447 22001 20832 30 3
110447 22008 20832 40 3
110447 22029 20832 40 3
110447 22092 20832 40 3
end
format %dD/N/CY consult_d
label values disease_n lbdisease_n
label def lbdisease_n 29 "disease 29", modify
label def lbdisease_n 30 "disease 30", modify
label def lbdisease_n 40 "diseases 40", modify
label def lbdisease_n 47 "disease 47", modify
In my analysis, I would like to see the "initial" diseases a patient has when he arrives to see a doctor (and later check if they correlate with other variables from my db). For example, I would like to see which diseases the patients have in their first six months since they came to the hospital. My main problem is the fact that the diseases are repeated and I cannot seem to get rid of these repetitions. But even without taking into consideration this lapse of time, I am not able to say which diseases, in total, the patients have without the repetition...

Any help or suggestion is highly highly appreciated, thank you!

Chiara