comparing multiple variables from two datasets without ID

Dear Stata friends,

Following Scenario:
I have to Datasets with four variables each: Schoolname, Address, City and Postalcode.
The first dataset is from 2016, the second one from 2020
Both have around 3500 Observations, but not exactly the same number, also not in the same order. There is no common identical ID.

Task:
I would like to compare the datasets. Which data from which school has change within the four years, which data is still the same.

What have I tried:
My first though was to use cfout.

Code:

cfout SCHOOLNAME ADDRESS CITY PCODE using "xxxxx", saving(diffs)
use diffs

But without ID that doesnt work (I only realized later that the ID's in both datasets are not the same...).

A second though was to use a normal merge and repeat it over and over again. Starting with all variables and then looking for less intersections.

Code:

merge 1:1 SCHOOLNAME ADDRESS CITY PCODE using "xxxxx.dta"
export excel using "xxxxx.xlsx" if _merge== 3, firstrow(variables) sheet("err_all_merge") nolabel replace

But that seems very hard to analyse at the end. There would be alot of differents Excel sheets and combinations.

Is there another strategy which I am not seeing?

Thanks for your time

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / comparing multiple variables from two datasets without ID
comparing multiple variables from two datasets without ID

0 Response to comparing multiple variables from two datasets without ID

Post a Comment

Home / Data Cleaning / Data management / Data Processing / comparing multiple variables from two datasets without ID comparing multiple variables from two datasets without ID

Related Posts with comparing multiple variables from two datasets without ID

0 Response to comparing multiple variables from two datasets without ID

Post a Comment

Home / Data Cleaning / Data management / Data Processing / comparing multiple variables from two datasets without ID
comparing multiple variables from two datasets without ID