This shows each time a patient had an operation indicated with procedureno.
If the patient had a R or subsequently L operation it is coded with a unique indexcode (but may have the same procedureno as it may have been done in the same sitting.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(procedureno indexcode new_dvt new_mi pe sidep genderp date) 200 15 1 0 0 2 1 22798 888 9 . 1 1 1 1 22646 999 13 1 1 . 2 2 22706 999 15 1 0 . 2 2 22767 999 16 1 0 1 2 1 22798 999 10 1 1 1 1 2 22678 end format %td date label values sidep side label def side 1 "R", modify label def side 2 "L", modify label values genderp gender label def gender 1 "M", modify label def gender 2 "F", modify
I then have a hospital dataset. Which shows each time the patient came to hospital for any reason. //The only unique identifier between hospital dataset & procedure dataset is the procedureno //
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(procedureno ssi dehiscence pe) str3 uniqueno float(admissiondate gender side) 888 0 1 1 "1A" 22645 1 1 888 1 1 1 "2A" 22690 1 1 999 1 1 1 "3A" 22705 2 1 999 1 0 1 "4A" 22737 2 2 end format %td admissiondate label values gender gender label def gender 1 "M", modify label def gender 2 "F", modify label values side side label def side 1 "R", modify label def side 2 "L", modify
A.
Please bear in mind I have already screened the hospital dataset and am working with a subset of patients which are duplicates - trying to identify which to keep.
Now as the proceduredata set is very big I would like to merge using the hospitaldataset as the master dataset & procedure dataset as the using dataset.
Unique identifier to merge - will be procedureno
And from the proceduredata I onyl would like to keep genderp (gender) , sidep (side of surgery), date which is the operation date.
Thus, knowing there are duplicates in my hospitaldataset + duplicates in my proceduredataset (as patients who have the same procedureno may have had a R + L operation) I need to merge using the following code:
merge procedureno using "file location", keep (genderp sidep date)
Result:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(procedureno ssi dehiscence pe) str3 uniqueno float(admissiondate gender side sidep genderp date) byte _merge 888 0 1 1 "1A" 22645 1 1 1 1 22646 3 888 1 1 1 "2A" 22690 1 1 1 1 22646 3 999 1 1 1 "3A" 22705 2 1 1 2 22678 3 999 1 0 1 "4A" 22737 2 2 2 2 22706 3 200 . . . "" . . . 2 1 22798 2 999 1 0 1 "4A" 22737 2 2 2 1 22798 3 999 1 0 1 "4A" 22737 2 2 2 2 22767 3 end format %td admissiondate format %td date label values sidep side label def side 1 "R", modify label def side 2 "L", modify label values genderp gender label def gender 1 "M", modify label def gender 2 "F", modify
1. The 'extra ones from proceduredata which remains unmatched will appear as missing in the variables ssi, dehiscence, pe, uniqueno etc as seen in procedureno 200. Which I am aware will happen as I have already removed the unique values from the hospitaldata set (as explained in A)
Therefore:
//Find the variables that do not have observations within the hospital dataset apart from those variables that you have matched on
generate miss = missing(genderp, gender, ssi, dehiscence, pe, uniqueno)
drop if miss == 1
//fIND the patients that are the same patients - ie THE pt has the procedureno + same gender identified from hospitaldataset + gender from proceduredataset
duplicates tag genderp gender procedureno, gen(duplicates)
//Values 0 if gender from hospitaldataset + gender from proceduredataset are different
drop if duplicates == 0
//Focus on those that are the same patients (same gender + same side) and pick the ones with the least days from the admissiondate
gen difference = date - admissiondate
keep if difference <= 70 & sidep == side
*If a patient has had a revision operation of the same side, this should not normally be within 70 days hence the reason why I have used 70 days. In fact I will change this to perhaps 5 days.
This merge to many seems to work for me. Now of course I'm working on a dummy dataset as I want to see how it works. But why so much critism? I don't see another way round it.
I can't use append, nor can I use 1:m or m:1 at this time as none of unique values, this is the the most consice dataset I have managed to get it too.
Im working on duplicates with the aim of creating unique values.
I will then merge again using m: 1
0 Response to Merging m:m - it seems to work for me. Why critise? Appreciate your feedback
Post a Comment