Hi all,

I am new to stata, and have quite large data set I'm working on, I'd be grateful for some advice:

First question is: whether "origin" is required in the stset? I have some code from others that hasn't used it but see many using it. Because my aim is to look at annual incidence rates, I don't know really how relevant it is?


Second question is:
I want to stsplit this to allow agecat, sex and agecat with sex combined stratification for incidence rates for the disease of interest for each year from 2000 to 2016. and then graphically represent this with CIs.



My problem is that because dataset includes almost 9 million subjects, the number of observations when I "(0(1)16)trim" is way too big for my computer to handle. Is this commonly encountered? is there a better way?

Many thanks,

Dom




This is what I have done:

stset dox, id(patient_id) fail(o_UC==1) enter(doe) scale(365.24)

gen _y = _t - _t0
stsplit _year, after(time=d(1/1/2000)) at(0(1)16) trim ***************************//this is where things go awry.
replace _year=2000 + _year