Hi I’m working with very large datasets - ones that contain for 1mill observations with 20+ variables.
lets call these datasets: hospital dataset and procedure dataset
Question: in your experience would you recommend keeping the number of merges to a minimum and just merge once then work on replicates from that one time merge ? Or do you think its equally as effective with the following process. I should think the process below is just more manageable as one knows exactly that all values are unique using the process below and there shouldnt be any issues when merging 1:1
Step 1:
identify the hes Unique values in the hospitaldataset merge 1:1 onto my proceduredataset
Step2:
Identify replicate values in the hospital dataset and pick the first one and merge onto my proceduredataset using 1:1
Step3:
Look at the ones with same procedureno but different hospitalid no and identify what’s different. Then final merge onto procedure dataset.
That’s a total of 3 merges. Would you recommend this or would you recommend , just do 1 merge from hospitaldataset to proceduredataset and then deal with replicates within that dataset - as the above requires 3 merges.
Related Posts with Data manipulation - your experience with merging
Graphical analysis of ratesHello, I am new to STATA and am using Version 15. Due to the confidential nature of my data I canno…
Reshape help for household dataStata novice here so please be gentle, but I have a dataset from a household survey. Basic demograph…
Out-of-sample forecastingHello, I'm trying to forecast an arima(1,0,1) model with 4 variables out of sample. The dataset run…
Comparing rates in stataHello, I am new to STATA and am using Version 15. Due to the confidential nature of my data I canno…
Pescadf: not reporting value of test statistic and p-valueHi all, I have an unbalanced panel of around 900 companies for a period of 81 months. I have used t…
Subscribe to:
Post Comments (Atom)
0 Response to Data manipulation - your experience with merging
Post a Comment