Hello, my apologies for not using dataex. I am using healthcare data so I have created an example below.
I am looking to create a variable that will satisfy this equality:

visit_date (for visit_type==1) <= visit_date (for visit_type==2) <= visit_date (for visit_type==1 + 365 days)


here is the structure I have:

study_id unique_id visit_type visit_date
1 165423 1 01/01/2000
1 164651 2 06/07/2000
2 949628 1 03/05/2001
2 489461 2 04/05/2002
3 984665 1 02/20/2002
3 894861 2 01/06/2003
4 894156 1 10/10/2002
4 876464 3 10/02/2003
4 786386 2 11/05/2003

// all date values of Type: double and Format: tc%
// unique_id is unique to every occurrence of "activity" i.e. unique in the entirety of the dataset
// I have given an example where I have more than one value for what is supposed to be the follow-up measurement but the actual measurement was erroneous in some way (e.g. three study_id values, and visit_type==3 instead of 1 or 2).

study_id unique_id visit_type visit_date within_window
1 165423 1 01/01/2000 .
1 164651 2 06/07/2000 1
2 949628 1 03/05/2001 .
2 489461 2 04/05/2002 0
3 984665 1 02/20/2002 .
3 894861 2 01/06/2003 1
4 894156 1 10/10/2002 .
4 876464 3 10/02/2003 .
4 786386 2 11/05/2003 0

// you can see that I desire within_window==. if visit_type!=2
// I don't think -reshape wide- will help because as it stands there are > 100 variables and > 10,000 observations in the dataset

I have tried something very simple and non-elegant:

Code:
     
* visit_date for visit_type==1
gen double FirstVisitDate=vis_date if visit_type==1                      
  format FirstVisitDate %tc
     
* visit_date for visit_type==2
gen FollowUpVisitDate=visit_date if visit_type==2
  format FollowUpVisitDate %tc

* visit_datefor visit_type==1 + 365 days
gen FirstVisitDate_plus365=visit_date if visit_type==1
  format FirstVisitDate_plus365 %tc
  replace FirstVisitDate_plus365=FirstVisitDate_plus365+3.1536*10^10
  // 3.1536*10^10 = 1 year in milliseconds (non-leap since %tc)

* var returned within the time-window
gen within_window=.                                               
  replace within_window=1 if FirstVisitDate < FollowUpVisitDate < FirstVisitDate_plus365          
  replace within_window=0 if missing(FollowUpVisitDate) | FollowUpVisitDate > FirstVisitDate_plus365
This solution seems somewhat bizarre though.
I end up with the dataset looking something like this:

study_id unique_id visit_type visit_date FirstVisitDate FollowUpVisitDate FirstVisitDate_plus365 within_window
1 165423 1 01/01/2000 01/01/2000 . 01/01/2001 0
1 164651 2 06/07/2000 . 06/07/2000 . 1

Clearly the value within_window==1 is correct, however, I am confused about how Stata is reading this given that the observations are on different lines, so would Stata not be evaluating the calculation of within_window as:


Code:
FirstVisitDate==01/01/2000 < FollowUpVisitDate==. < FirstVisitDate_plus365==01/01/2001

FirstVisitDate==. < FollowUpVisitDate==06/09/2000 < FirstVisitDate_plus365==.
I have checked multiple values manually which appear to be correct but there are too many to sort through by simply observing value by value etc.

Questions:
  1. Can someone lend a more elegant solution, point out my errors, or have a better way to quality check this solution?
  2. Also, I would like to have within_window==. for the observations that have visit_type==1 but I suppose when I aggregate I can take only values of 0 or 1 where visit_type==2 if my data stay in the form I have above.

Thanks for the help as always.