I have two datasets describing the same 80 schools.
Some variables are similar -- for example, both lists include the schools' enrollment and % African American -- but there is no common ID or other variable that matches the two lists perfectly.
Is there a command that will find the best way to link the two datasets?
If not, I've done much of the work. I've made an 80x80=6400 row dataset that gives the Mahalanobis distance between every possible combination of a row from the first dataset with a row from the second. Now how do I optimally sort and deduplicate 6400 rows to get the best 80 matches, with no duplicates from either list?
0 Response to Record linkage with quantitative variables
Post a Comment