Hi Folks,
This is question is more of a paranoia check than anything else. I have the following code which identifies people who have had 3 or more ed visits within a 3 month time period, then generates a new variable identifying people as having No ed visits, low ed visits (<=2 edvisits in 3 months) and high ed visits (3+ edvisits in 3 months):

Code:
rangestat (sum) total=edvisits, interval (mdate -2 0) by (studyid)
rangestat (max) max=total, interval (studyid 0 0)
gen repeat_user = max>= 3

mvencode edvisits, mv(0)

bysort studyid (edvisits) : gen patient_cat = edvisits[_N] ==0
replace patient_cat = 3 if repeat user == 1
replace patient_cat = 2 if patient_cat == 0

label def patient_cat 1 no 2 low 3 high
label val patient cat patient_cat
tab patient_cat
I would like to modify this code to capture people who have three or more visits in a 12 month period, so I have changed it to :

Code:
rangestat (sum) total=edvisits, interval (mdate -11 0) by (studyid)
rangestat (max) max=total, interval (studyid 0 0)
gen repeat_user = max>= 3

mvencode edvisits, mv(0)

bysort studyid (edvisits) : gen patient_cat = edvisits[_N] ==0
replace patient_cat = 3 if repeat user == 1
replace patient_cat = 2 if patient_cat == 0

label def patient_cat 1 no 2 low 3 high
label val patient cat patient_cat
tab patient_cat
The reason that I'm concerned is that both variations on the code turn up the exact same number of people in the "no" "low" and "high" categories. I would expect there to be the same number in "no" but I would think that the distribution of people in the "low" and "high" categories would shift by expanding the window from 3 to 12 months.

Can anyone see an issue with the code?

Any thoughts would be much appreciated.

Thanks so much!