I am trying to match data on cities in two data sets using only the names of the cities. Unfortunately, there are many variants for how names can be presented.
For example,
in one data set the name of a city is TERREBONNE PARISH CONSOLIDATED GOVERNMENT but
in the other data set the name of the city is TERREBONNE CONSOLIDATED GOVERNMENT.
These are almost certainly the same city but when I use the matchit function they do not give me a perfect match.
I cannot just use matchit's similarity scores because there are also cases in the data like APPLEGATE VILLAGE which is matched to THE VILLAGE OF DOUGLAS. These two are almost certainly not the same city but I get a matchit score of .5 because they both contain the word VILLAGE
I thought about creating a variable for each word in the city name and then dropping words like "Village", "Town" "of" etc. but even if I programmed that in I can't figure out any reasonablely efficient strategy to find that "TERREBONNE" is in each name since the word order can differ across data sets.
Suggestions for how to proceed would be appreciated.
Related Posts with Help matching city names in two data sets
Individual fixed effectsDear all, Is it possible to do an individual fixed effects analysis with cross sectional data? If i…
Estimate treatment effects in randomized controlled trialsDear stata users, I am using mixed effext model to estimate treatment effects in a RCT. I am buildin…
Proportion confidence interval with 95% CIHow I can make a two way plot with CI. Like this graph …
putexcel column names when not all possible columns have valuesI am trying to create several descriptive tables, using putexcel. I am struggling to get a working m…
Multiple regressions by group in one output tableHello Stata Experts! Currently I am lost at "just" finding the right output form/ format, which is …
Subscribe to:
Post Comments (Atom)
0 Response to Help matching city names in two data sets
Post a Comment