Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values

Hello, my apologies for not using dataex. I am using healthcare data so I have created an example below.
I am looking to create a variable that will satisfy this equality:

visit_date (for visit_type==1) <= visit_date (for visit_type==2) <= visit_date (for visit_type==1 + 365 days)

here is the structure I have:

study_id	unique_id	visit_type	visit_date
1	165423	1	01/01/2000
1	164651	2	06/07/2000
2	949628	1	03/05/2001
2	489461	2	04/05/2002
3	984665	1	02/20/2002
3	894861	2	01/06/2003
4	894156	1	10/10/2002
4	876464	3	10/02/2003
4	786386	2	11/05/2003

// all date values of Type: double and Format: tc%
// unique_id is unique to every occurrence of "activity" i.e. unique in the entirety of the dataset
// I have given an example where I have more than one value for what is supposed to be the follow-up measurement but the actual measurement was erroneous in some way (e.g. three study_id values, and visit_type==3 instead of 1 or 2).

study_id	unique_id	visit_type	visit_date	within_window
1	165423	1	01/01/2000	.
1	164651	2	06/07/2000	1
2	949628	1	03/05/2001	.
2	489461	2	04/05/2002	0
3	984665	1	02/20/2002	.
3	894861	2	01/06/2003	1
4	894156	1	10/10/2002	.
4	876464	3	10/02/2003	.
4	786386	2	11/05/2003	0

// you can see that I desire within_window==. if visit_type!=2
// I don't think -reshape wide- will help because as it stands there are > 100 variables and > 10,000 observations in the dataset

I have tried something very simple and non-elegant:

Code:

     
* visit_date for visit_type==1
gen double FirstVisitDate=vis_date if visit_type==1                      
  format FirstVisitDate %tc
     
* visit_date for visit_type==2
gen FollowUpVisitDate=visit_date if visit_type==2
  format FollowUpVisitDate %tc

* visit_datefor visit_type==1 + 365 days
gen FirstVisitDate_plus365=visit_date if visit_type==1
  format FirstVisitDate_plus365 %tc
  replace FirstVisitDate_plus365=FirstVisitDate_plus365+3.1536*10^10
  // 3.1536*10^10 = 1 year in milliseconds (non-leap since %tc)

* var returned within the time-window
gen within_window=.                                               
  replace within_window=1 if FirstVisitDate < FollowUpVisitDate < FirstVisitDate_plus365          
  replace within_window=0 if missing(FollowUpVisitDate) | FollowUpVisitDate > FirstVisitDate_plus365

This solution seems somewhat bizarre though.
I end up with the dataset looking something like this:

study_id	unique_id	visit_type	visit_date	FirstVisitDate	FollowUpVisitDate	FirstVisitDate_plus365	within_window
1	165423	1	01/01/2000	01/01/2000	.	01/01/2001	0
1	164651	2	06/07/2000	.	06/07/2000	.	1

Clearly the value within_window==1 is correct, however, I am confused about how Stata is reading this given that the observations are on different lines, so would Stata not be evaluating the calculation of within_window as:

Code:

FirstVisitDate==01/01/2000 < FollowUpVisitDate==. < FirstVisitDate_plus365==01/01/2001

FirstVisitDate==. < FollowUpVisitDate==06/09/2000 < FirstVisitDate_plus365==.

I have checked multiple values manually which appear to be correct but there are too many to sort through by simply observing value by value etc.

Questions:

Can someone lend a more elegant solution, point out my errors, or have a better way to quality check this solution?
Also, I would like to have within_window==. for the observations that have visit_type==1 but I suppose when I aggregate I can take only values of 0 or 1 where visit_type==2 if my data stay in the form I have above.

Thanks for the help as always.

BJ Data Tech Solution

0 Response to Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values

Post a Comment

Related Posts with Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values

0 Response to Creating a variable: TRUE or FALSE based on date within two other dates for each participant of of which there are multiple ID values

Post a Comment