Dear All,

This might be a silly question, but it is driving me crazy.

I am managing data which were not recorded for survival analysis and I am trying to put them in a proper format.

For the purpose of my question, here my data (I have more variables, but they behave as Var1 and Var2, namely varying during time):
ID Visit Date DOsp1 DOsp2 Sex Var1 Var2
1 0 1mar2002 M 0 .
1 1 3jun2005 M . .
1 2 4feb2007 M . .
2 0 9feb2002 21dec2000 22jun2001 F 1 18.9
2 1 7sep2002 F 2 9999
3 0 25mar2003 M 0 20
3 1 13oct2004 M 2 9999
4 0 4oct2002 F 1 23.5
4 1 03may2004 4jan2003 24jun2003 F . .
4 2 13jan2006 F . .
4 3 25aug2007 F 2 9999

ID is my person identifier, who can be visited several times (Visit, 0 is the baseline) in different dates (Date is when the visit took place). Each person, during the visit, could say up to 9 dates (I do have DOsp1-DOsp9, but for the sake of this question I just put the first two) regarding if and when they were hospitalized between the visits.

I will use snapspan in order to convert my data to time-span data, but before I guess I need to slightly change my time variable (and the dataset overall).

I want to have a timevar like Time (see table below) in order to run snapspan ID Time.

ID Visit Date DOsp1 DOsp2 Sex Var1 Var2 Time
1 0 1mar2002 M 0 . 1mar2002
1 1 3jun2005 M . . 3jun2005
1 2 4feb2007 M . . 4feb2007
2 . . . . . . . 21dec2000
2 . . . . . . . 22jun2001
2 0 9feb2002 21dec2000 22jun2001 F 1 18.9 9feb2002
2 1 7sep2002 F 2 9999 7sep2002
3 0 25mar2003 M 0 20 25mar2003
3 1 13oct2004 M 2 9999 13oct2004
4 0 4oct2002 F 1 23.5 4oct2002
4 . . . . . . . 4jan2003
4 . . . . . . . 24jun2003
4 1 03may2004 4jan2003 24jun2003 F . . 03may2004
4 2 13jan2006 F . . 13jan2006
4 3 25aug2007 F 2 9999 25aug2007

This is the final dataset I want to obtain:
ID Datestarts Dateends Sex Var1 Var2 Event Event_recode
1 . 1mar2002 M 0 . Visit 0 0
1 1mar2002 3jun2005 M . . Visit 1 0
1 3jun2005 4feb2007 M . . Visit 2 0
2 . 9feb2002 F 1 18.9 Visit 0 0
2 9feb2002 7sep2002 F 2 9999 Visit 1 2
3 . 25mar2003 M 0 20 Visit 0 0
3 25mar2003 13oct2004 M 2 9999 Visit 1 2
4 . 4oct2002 F 1 23.5 Visit 0 0
4 4oct2002 4jan2003 F . . Osp 1 1
4 4jan2003 24jun2003 F . . Osp 2 1
4 24jun2003 03may2004 F . . Visit 1 0
4 03may2004 13jan2006 F . . Visit 2 0
4 13jan2006 25aug2007 F 2 9999 Visit 3 2
As you might notice, if any date recorded in DOsp1-DOsp9 happened before Visit 0, it will not be taken into account. Then Event_recode will be build in order to have the failure var for my stset (Event_recode will be 0 if the row is regarding a visit, 1 if it is regarding an hospitalization, 2 if the person dies, namely if Var1==2, and then 3 if it is censored).

All of that, in order to run the following code:

stset Dataends, id(ID) time0( Datastarts ) origin(time Datastarts ) failure(Event_recode==1 2 ).

Thank you to anyone who can help me, feel free to ask me clarifications.
Best