I am looking to create a variable that will satisfy this equality:
visit_date (for visit_type==1) <= visit_date (for visit_type==2) <= visit_date (for visit_type==1 + 365 days)
here is the structure I have:
study_id | unique_id | visit_type | visit_date |
1 | 165423 | 1 | 01/01/2000 |
1 | 164651 | 2 | 06/07/2000 |
2 | 949628 | 1 | 03/05/2001 |
2 | 489461 | 2 | 04/05/2002 |
3 | 984665 | 1 | 02/20/2002 |
3 | 894861 | 2 | 01/06/2003 |
4 | 894156 | 1 | 10/10/2002 |
4 | 876464 | 3 | 10/02/2003 |
4 | 786386 | 2 | 11/05/2003 |
// all date values of Type: double and Format: tc%
// unique_id is unique to every occurrence of "activity" i.e. unique in the entirety of the dataset
// I have given an example where I have more than one value for what is supposed to be the follow-up measurement but the actual measurement was erroneous in some way (e.g. three study_id values, and visit_type==3 instead of 1 or 2).
study_id | unique_id | visit_type | visit_date | within_window |
1 | 165423 | 1 | 01/01/2000 | . |
1 | 164651 | 2 | 06/07/2000 | 1 |
2 | 949628 | 1 | 03/05/2001 | . |
2 | 489461 | 2 | 04/05/2002 | 0 |
3 | 984665 | 1 | 02/20/2002 | . |
3 | 894861 | 2 | 01/06/2003 | 1 |
4 | 894156 | 1 | 10/10/2002 | . |
4 | 876464 | 3 | 10/02/2003 | . |
4 | 786386 | 2 | 11/05/2003 | 0 |
// you can see that I desire within_window==. if visit_type!=2
// I don't think -reshape wide- will help because as it stands there are > 100 variables and > 10,000 observations in the dataset
I have tried something very simple and non-elegant:
Code:
* visit_date for visit_type==1 gen double FirstVisitDate=vis_date if visit_type==1 format FirstVisitDate %tc * visit_date for visit_type==2 gen FollowUpVisitDate=visit_date if visit_type==2 format FollowUpVisitDate %tc * visit_datefor visit_type==1 + 365 days gen FirstVisitDate_plus365=visit_date if visit_type==1 format FirstVisitDate_plus365 %tc replace FirstVisitDate_plus365=FirstVisitDate_plus365+3.1536*10^10 // 3.1536*10^10 = 1 year in milliseconds (non-leap since %tc) * var returned within the time-window gen within_window=. replace within_window=1 if FirstVisitDate < FollowUpVisitDate < FirstVisitDate_plus365 replace within_window=0 if missing(FollowUpVisitDate) | FollowUpVisitDate > FirstVisitDate_plus365
I end up with the dataset looking something like this:
study_id | unique_id | visit_type | visit_date | FirstVisitDate | FollowUpVisitDate | FirstVisitDate_plus365 | within_window |
1 | 165423 | 1 | 01/01/2000 | 01/01/2000 | . | 01/01/2001 | 0 |
1 | 164651 | 2 | 06/07/2000 | . | 06/07/2000 | . | 1 |
Clearly the value within_window==1 is correct, however, I am confused about how Stata is reading this given that the observations are on different lines, so would Stata not be evaluating the calculation of within_window as:
Code:
FirstVisitDate==01/01/2000 < FollowUpVisitDate==. < FirstVisitDate_plus365==01/01/2001 FirstVisitDate==. < FollowUpVisitDate==06/09/2000 < FirstVisitDate_plus365==.
Questions:
- Can someone lend a more elegant solution, point out my errors, or have a better way to quality check this solution?
- Also, I would like to have within_window==. for the observations that have visit_type==1 but I suppose when I aggregate I can take only values of 0 or 1 where visit_type==2 if my data stay in the form I have above.
Thanks for the help as always.
0 Response to Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values
Post a Comment