Hi,

I am working on a database of medications taken by individuals which I need to summarise & I am having some difficulties.

Over the duration of my analysis, each individual (id) could be taking a number of medicines, and often start & stopped the same medicine more than once. I have the medicine names & the start & stop dates for each treatment course. An example of my data is below.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 id str200 whoname str16 atccode float(atcgroupid ftstartdate ftstopdate) str5 ongoing float(days ftfinal)
"01" "Pyridoxine"                   "A11HA"  1 20836 21103 "False" 267 21670
"01" "Hydrochlorothiazide"          "C03AA"  2 21187     . "True"  483 21670
"01" "Amlodipine"                   "C08CA"  3 21607     . "True"   63 21670
"01" "Azithromycin"                 "J01FA"  4 21523 21607 "False"  84 21670
"01" "Azithromycin"                 "J01FA"  4 21607     . "True"   63 21670
"01" "Isoniazid"                    "J04AC"  5 20867 21103 "False" 236 21670
"02" "Ranitidine"                   "A02BA"  6 21536     . "False"   . 21670
"02" "Ascorbic acid"                "A11GA"  7 20941 20948 "False"   7 21670
"02" "Ascorbic acid"                "A11GA"  7 21123 21127 "False"   4 21670
"02" "Pyridoxine"                   "A11HA"  8 20842 21207 "False" 365 21670
"02" "Hydrochlorothiazide"          "C03AA"  9 21207     . "True"  463 21670
"02" "Methyl salicylate"            "D02AF" 10 20941 21307 "False" 366 21670
"02" "Isoniazid"                    "J04AC" 11 20842 21207 "False" 365 21670
"02" "Ibuprofen sodium"             "M01AE" 12 20881 21305 "False" 424 21670
"02" "Paracetamol"                  "N02BE" 13 20941 20948 "False"   7 21670
"02" "Paracetamol"                  "N02BE" 13 21123 21127 "False"   4 21670
"02" "Oxymetazoline hydrochloride" "R01AA" 14 20941 20948 "False"   7 21670
"02" "Oxymetazoline hydrochloride" "R01AA" 14 21123 21127 "False"   4 21670
"03" "Pyridoxine"                   "A11HA" 15 20838 20880 "False"  42 21670
"03" "Pyridoxine"                   "A11HA" 15 20935 21276 "False" 341 21670
"03" "Amlodipine"                   "C08CA" 16 21277 21606 "False" 329 21670
"03" "Amlodipine"                   "C08CA" 16 21606     . "True"   64 21670
"03" "Aciclovir"                    "D06BB" 17 21060 21064 "False"   4 21670
"03" "Hydrocortisone"               "D07AA" 18 21025 21193 "False" 168 21670
"03" "Hydrocortisone acetate"      "D07AA" 18 21606     . "True"   64 21670
"03" "Amoxicillin"                  "J01CA" 19 21060 21065 "False"   5 21670
"03" "Isoniazid"                    "J04AC" 20 20838 20880 "False"  42 21670
"03" "Isoniazid"                    "J04AC" 20 20876 21276 "False" 400 21670
"03" "Isoniazid"                    "J04AC" 20 21276     . "True"  394 21670
"03" "Aciclovir"                    "J05AB" 21 21060 21064 "False"   4 21670
"03" "Paracetamol"                  "N02BE" 22 21060 21065 "False"   5 21670
end
format %td ftstartdate
format %td ftstopdate
format %td ftfinal
label var id "id" 
label var whoname "who name" 
label var atccode "atc code" 
label var atcgroupid "group(id atccode)" 
label var ftstartdate "medicine start date" 
label var ftstopdate "medicine stop date" 
label var ongoing "False=no; True=yes" 
label var days "days on medicine" 
label var ftfinal "date of extraction"
I want to generate variables 1) date of first ever start of each medicine, 2) date of final stop [or if ongoing], and 3) total number days on the treatment. I thought about collapsing the data & selecting the first start date, last stop date, and summing the days variable.

Some of the observations for the same id/medicine start on the same day (e.g. atcgroupid==4), & for some there are gaps (e.g. atcgroupid==7). However, the observations I am having the most problem with is when the start dates overlap with the stop dates for the previous entry (e.g. atcgroupid==20). I cannot sum the days as there is overlap & the days will be double counted.

Does anyone have any advice on how to potentially solve this?

I have an additional problem where some drugs are not ongoing (ongoing=="False"), but the stop date is missing (e.g. atcgroupid==6), but I don't think this is solvable.

I really appreciate any help,
Bryony