I am currently trying to merge two datasets.
One contains household data and the other personal data. The personal data has a household ID reference to allow for a m:1 merge.
However, since the dataset spans across many countries, I found out that the household ID is reset for each country.
To make an example, there will be a "household ID" #1 for United Kingdom that will illustrate data for the first household if UK, but so will "household #1" for Italy.
I thought there would be a "by" or similar option for merge, but I haven't seen it. Previous posts illustrate different problems to my knowledge.
Is there a way to succesfully merge the dataset.
My idea was to generate a new variable with numbers from 1 to n for household of every country without repeating numbers but I wondered if there was a more elegant and safe option.
I attach a fictitious example using dataex since I am not supposed to divulge these data.
Thanks a lot.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(householdID personID) str2 country 1 1 "UK" 1 2 "UK" 2 3 "UK" 3 4 "UK" 3 5 "UK" 3 6 "UK" 1 1 "IT" 2 2 "IT" 2 3 "IT" 3 4 "IT" 4 5 "IT" 1 1 "FR" 2 2 "FR" 3 3 "FR" 4 4 "FR" 4 5 "FR" 1 1 "ES" 1 2 "ES" 1 3 "ES" 2 4 "ES" 3 5 "ES" end
0 Response to Merging dataset with non-unique variables in the master data
Post a Comment