This might be a silly question, but it is driving me crazy.
I am managing data which were not recorded for survival analysis and I am trying to put them in a proper format.
For the purpose of my question, here my data (I have more variables, but they behave as Var1 and Var2, namely varying during time):
ID | Visit | Date | DOsp1 | DOsp2 | Sex | Var1 | Var2 |
1 | 0 | 1mar2002 | M | 0 | . | ||
1 | 1 | 3jun2005 | M | . | . | ||
1 | 2 | 4feb2007 | M | . | . | ||
2 | 0 | 9feb2002 | 21dec2000 | 22jun2001 | F | 1 | 18.9 |
2 | 1 | 7sep2002 | F | 2 | 9999 | ||
3 | 0 | 25mar2003 | M | 0 | 20 | ||
3 | 1 | 13oct2004 | M | 2 | 9999 | ||
4 | 0 | 4oct2002 | F | 1 | 23.5 | ||
4 | 1 | 03may2004 | 4jan2003 | 24jun2003 | F | . | . |
4 | 2 | 13jan2006 | F | . | . | ||
4 | 3 | 25aug2007 | F | 2 | 9999 |
ID is my person identifier, who can be visited several times (Visit, 0 is the baseline) in different dates (Date is when the visit took place). Each person, during the visit, could say up to 9 dates (I do have DOsp1-DOsp9, but for the sake of this question I just put the first two) regarding if and when they were hospitalized between the visits.
I will use snapspan in order to convert my data to time-span data, but before I guess I need to slightly change my time variable (and the dataset overall).
I want to have a timevar like Time (see table below) in order to run snapspan ID Time.
ID | Visit | Date | DOsp1 | DOsp2 | Sex | Var1 | Var2 | Time |
1 | 0 | 1mar2002 | M | 0 | . | 1mar2002 | ||
1 | 1 | 3jun2005 | M | . | . | 3jun2005 | ||
1 | 2 | 4feb2007 | M | . | . | 4feb2007 | ||
2 | . | . | . | . | . | . | . | 21dec2000 |
2 | . | . | . | . | . | . | . | 22jun2001 |
2 | 0 | 9feb2002 | 21dec2000 | 22jun2001 | F | 1 | 18.9 | 9feb2002 |
2 | 1 | 7sep2002 | F | 2 | 9999 | 7sep2002 | ||
3 | 0 | 25mar2003 | M | 0 | 20 | 25mar2003 | ||
3 | 1 | 13oct2004 | M | 2 | 9999 | 13oct2004 | ||
4 | 0 | 4oct2002 | F | 1 | 23.5 | 4oct2002 | ||
4 | . | . | . | . | . | . | . | 4jan2003 |
4 | . | . | . | . | . | . | . | 24jun2003 |
4 | 1 | 03may2004 | 4jan2003 | 24jun2003 | F | . | . | 03may2004 |
4 | 2 | 13jan2006 | F | . | . | 13jan2006 | ||
4 | 3 | 25aug2007 | F | 2 | 9999 | 25aug2007 |
This is the final dataset I want to obtain:
ID | Datestarts | Dateends | Sex | Var1 | Var2 | Event | Event_recode |
1 | . | 1mar2002 | M | 0 | . | Visit 0 | 0 |
1 | 1mar2002 | 3jun2005 | M | . | . | Visit 1 | 0 |
1 | 3jun2005 | 4feb2007 | M | . | . | Visit 2 | 0 |
2 | . | 9feb2002 | F | 1 | 18.9 | Visit 0 | 0 |
2 | 9feb2002 | 7sep2002 | F | 2 | 9999 | Visit 1 | 2 |
3 | . | 25mar2003 | M | 0 | 20 | Visit 0 | 0 |
3 | 25mar2003 | 13oct2004 | M | 2 | 9999 | Visit 1 | 2 |
4 | . | 4oct2002 | F | 1 | 23.5 | Visit 0 | 0 |
4 | 4oct2002 | 4jan2003 | F | . | . | Osp 1 | 1 |
4 | 4jan2003 | 24jun2003 | F | . | . | Osp 2 | 1 |
4 | 24jun2003 | 03may2004 | F | . | . | Visit 1 | 0 |
4 | 03may2004 | 13jan2006 | F | . | . | Visit 2 | 0 |
4 | 13jan2006 | 25aug2007 | F | 2 | 9999 | Visit 3 | 2 |
All of that, in order to run the following code:
stset Dataends, id(ID) time0( Datastarts ) origin(time Datastarts ) failure(Event_recode==1 2 ).
Thank you to anyone who can help me, feel free to ask me clarifications.
Best
0 Response to Timevar for survival analysis
Post a Comment