Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte study_id long unique_id byte visit_type double visit_date byte(lab_test biopsy1 biopsy2) 1 174532 1 1.2759552e+12 42 41 34 1 949628 2 1.2993696e+12 26 27 52 1 165423 1 1.262304e+12 25 26 26 1 489461 3 1.333584e+12 30 31 31 end format %tc visit_date
Overall, I need to make two datasets that is distinct by study_id. I have given only 1 study_id for example but there are >2000
First dataset:
1) I need to have a variable that shows the max biopsy value per study_id, and there are numerous biopsies which are wide in the dataset. I have just given biopsy1 and biopsy2 for example.
2) I need to have a variable that shows the max lab_test value per study_id (this is not wide)
3) need to limit the dataset so it is by study_id and then we only need the information from the row where the max biopsy took place and the max lab_test value for that study_id. I will eventually limit this by visit type but that should not be difficult.
I have done these first two with
*1)
Code:
egen max_biopsy_row = rowmax(biopsy1 biopsy2)
*now since max_biopsy is one column of measurements we can get severity by study_id
Code:
egen max_biopsy_by_studyid = max(max_biopsy), by(study_id)
Code:
egen max_lab_by_studyid = max(lab_test), by(study_id)
Code:
keep if max_biopsy_row==max_biopsy_by_studyid
Second dataset:
Now back to the master data. I would like to create two variables that are the difference between the two dates of visit_type==3 and visit_type==2 per study_id.
For the first variable, I need some that gives me the same answer as this difference:
visit_date (where visit_type==3 and is the max biopsy value for visit type==3) - visit_date (where visit_type==2 and is the max biopsy value for visit type==3), all by study_id
The second variable is similar only with visit_type==1 and max_lab_test values
visit_date (where visit_type==1 and is the max biopsy value for visit type==1) - visit_date (where visit_type==1 and is the max lab_test value for visit type==1), all by study_id
I am not sure how to approach this second dataset as I don't believe the values I created in dataset one can get me the correct set up.
Any help is appreciated.
Thanks.
0 Response to date differences using max values where ID values are duplicate
Post a Comment