Following Scenario:
I have to Datasets with four variables each: Schoolname, Address, City and Postalcode.
The first dataset is from 2016, the second one from 2020
Both have around 3500 Observations, but not exactly the same number, also not in the same order. There is no common identical ID.
Task:
I would like to compare the datasets. Which data from which school has change within the four years, which data is still the same.
What have I tried:
My first though was to use cfout.
Code:
cfout SCHOOLNAME ADDRESS CITY PCODE using "xxxxx", saving(diffs) use diffs
A second though was to use a normal merge and repeat it over and over again. Starting with all variables and then looking for less intersections.
Code:
merge 1:1 SCHOOLNAME ADDRESS CITY PCODE using "xxxxx.dta" export excel using "xxxxx.xlsx" if _merge== 3, firstrow(variables) sheet("err_all_merge") nolabel replace
Is there another strategy which I am not seeing?
Thanks for your time
0 Response to comparing multiple variables from two datasets without ID
Post a Comment