Hi all,

Can anyone please assist with the following problem?

I am currently working on a hospital admissions and primary care dataset for patients with chronic rheumatic diseases (e.g.: rheumatoid arthritis (RA) , lupus (SLE), vasculitis (SV)..etc)
Most of the patients have had multiple diagnoses over time (e.g. patient A was diagnosed with Lupus on 01 June 2008 and then was later diagnosed with vasculitis (05 August 2010) and finally was confirmed to have rheumatoid arthritis on 11 November 2010

I am doing a case-control study, (panel dataset). I've coded these conditions as individual cases (1 - RA, 2: SLE and 3:SV). However, for people with multiple diagnoses over time, I am struggling to code them using their latest date of diagnosis.

An example of this:

date1 - date of rheumatoid arthritis diagnosis
date2 - date of vasculitis (SV) diagnosis
date 3 - date of Lupus (SLE) diagnosis


All the dates were formatted using the date function

format date1 date2 date3 %td

The cases were generate in this format:

gen RA_cases=1 if date1 !=.
gen lupus_cases =1 if date2 !=.
gen SV_cases =1 if date3 !=.


Patients with multiple chronic rheumatic diagnoses were counted and tagged (or temporary removed)

count if RA_cases & SV_cases ==1 ##(the count is n=283)

gen cases_with_multiple_chrRD =.a if RA_cases & SV_cases ==1



Now, I am struggling to code these patients based on their latest diagnosis - I've tried this

replace RA_cases =.a if cases_with_multiple_chrRD ==.a
replace RA_cases =1 if date1 > date2 & date2 !=.


The above command doesn't seem to work because it includes missing values and I've also tried formatting the date variables back to numerical (see below).
Can anyone please assist?
date1 date2
18576
19684
18168
17714 20248
15157
14292
19423
7852 7897
19103
17744
18079
from
date1 date2
22-Nov-13
28-Sep-09
01-Jul-08 09-Jun-15
01-Jul-01
01-Aug-14
01-Jul-06
17-Feb-99
06-Mar-13
01-Jul-81 15-Aug-81
20-Apr-12
31-Jul-08