Merging m:m - it seems to work for me. Why critise? Appreciate your feedback

As you can see in my previous posts. I currently have one huge dataset called procedure.

This shows each time a patient had an operation indicated with procedureno.
If the patient had a R or subsequently L operation it is coded with a unique indexcode (but may have the same procedureno as it may have been done in the same sitting.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(procedureno indexcode new_dvt new_mi pe sidep genderp date)
200 15 1 0 0 2 1 22798
888  9 . 1 1 1 1 22646
999 13 1 1 . 2 2 22706
999 15 1 0 . 2 2 22767
999 16 1 0 1 2 1 22798
999 10 1 1 1 1 2 22678
end
format %td date
label values sidep side
label def side 1 "R", modify
label def side 2 "L", modify
label values genderp gender
label def gender 1 "M", modify
label def gender 2 "F", modify

I then have a hospital dataset. Which shows each time the patient came to hospital for any reason. //The only unique identifier between hospital dataset & procedure dataset is the procedureno //

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(procedureno ssi dehiscence pe) str3 uniqueno float(admissiondate gender side)
888 0 1 1 "1A" 22645 1 1
888 1 1 1 "2A" 22690 1 1
999 1 1 1 "3A" 22705 2 1
999 1 0 1 "4A" 22737 2 2
end
format %td admissiondate
label values gender gender
label def gender 1 "M", modify
label def gender 2 "F", modify
label values side side
label def side 1 "R", modify
label def side 2 "L", modify

A.
Please bear in mind I have already screened the hospital dataset and am working with a subset of patients which are duplicates - trying to identify which to keep.

Now as the proceduredata set is very big I would like to merge using the hospitaldataset as the master dataset & procedure dataset as the using dataset.
Unique identifier to merge - will be procedureno

And from the proceduredata I onyl would like to keep genderp (gender) , sidep (side of surgery), date which is the operation date.

Thus, knowing there are duplicates in my hospitaldataset + duplicates in my proceduredataset (as patients who have the same procedureno may have had a R + L operation) I need to merge using the following code:

merge procedureno using "file location", keep (genderp sidep date)

Result:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(procedureno ssi dehiscence pe) str3 uniqueno float(admissiondate gender side sidep genderp date) byte _merge
888 0 1 1 "1A" 22645 1 1 1 1 22646 3
888 1 1 1 "2A" 22690 1 1 1 1 22646 3
999 1 1 1 "3A" 22705 2 1 1 2 22678 3
999 1 0 1 "4A" 22737 2 2 2 2 22706 3
200 . . . ""       . . . 2 1 22798 2
999 1 0 1 "4A" 22737 2 2 2 1 22798 3
999 1 0 1 "4A" 22737 2 2 2 2 22767 3
end
format %td admissiondate
format %td date
label values sidep side
label def side 1 "R", modify
label def side 2 "L", modify
label values genderp gender
label def gender 1 "M", modify
label def gender 2 "F", modify

1. The 'extra ones from proceduredata which remains unmatched will appear as missing in the variables ssi, dehiscence, pe, uniqueno etc as seen in procedureno 200. Which I am aware will happen as I have already removed the unique values from the hospitaldata set (as explained in A)

Therefore:
//Find the variables that do not have observations within the hospital dataset apart from those variables that you have matched on

generate miss = missing(genderp, gender, ssi, dehiscence, pe, uniqueno)
drop if miss == 1

//fIND the patients that are the same patients - ie THE pt has the procedureno + same gender identified from hospitaldataset + gender from proceduredataset
duplicates tag genderp gender procedureno, gen(duplicates)

//Values 0 if gender from hospitaldataset + gender from proceduredataset are different
drop if duplicates == 0

//Focus on those that are the same patients (same gender + same side) and pick the ones with the least days from the admissiondate
gen difference = date - admissiondate
keep if difference <= 70 & sidep == side

*If a patient has had a revision operation of the same side, this should not normally be within 70 days hence the reason why I have used 70 days. In fact I will change this to perhaps 5 days.

This merge to many seems to work for me. Now of course I'm working on a dummy dataset as I want to see how it works. But why so much critism? I don't see another way round it.
I can't use append, nor can I use 1:m or m:1 at this time as none of unique values, this is the the most consice dataset I have managed to get it too.

Im working on duplicates with the aim of creating unique values.

I will then merge again using m: 1

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Merging m:m - it seems to work for me. Why critise? Appreciate your feedback
Merging m:m - it seems to work for me. Why critise? Appreciate your feedback

0 Response to Merging m:m - it seems to work for me. Why critise? Appreciate your feedback

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Merging m:m - it seems to work for me. Why critise? Appreciate your feedback Merging m:m - it seems to work for me. Why critise? Appreciate your feedback

Related Posts with Merging m:m - it seems to work for me. Why critise? Appreciate your feedback

0 Response to Merging m:m - it seems to work for me. Why critise? Appreciate your feedback

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Merging m:m - it seems to work for me. Why critise? Appreciate your feedback
Merging m:m - it seems to work for me. Why critise? Appreciate your feedback