I am trying to match data on cities in two data sets using only the names of the cities. Unfortunately, there are many variants for how names can be presented.
For example,
in one data set the name of a city is TERREBONNE PARISH CONSOLIDATED GOVERNMENT but
in the other data set the name of the city is TERREBONNE CONSOLIDATED GOVERNMENT.
These are almost certainly the same city but when I use the matchit function they do not give me a perfect match.
I cannot just use matchit's similarity scores because there are also cases in the data like APPLEGATE VILLAGE which is matched to THE VILLAGE OF DOUGLAS. These two are almost certainly not the same city but I get a matchit score of .5 because they both contain the word VILLAGE
I thought about creating a variable for each word in the city name and then dropping words like "Village", "Town" "of" etc. but even if I programmed that in I can't figure out any reasonablely efficient strategy to find that "TERREBONNE" is in each name since the word order can differ across data sets.
Suggestions for how to proceed would be appreciated.
Related Posts with Help matching city names in two data sets
IVREGHDFE: Collinearity with fixed effectsMy project is to analyze the effect of the interest rate on the financial institution's lending beha…
Predicted values with a categorical x continuous interaction in a multilevel model with multiple imputationDear all, I am working with cross sectional survey data from 22 countries with 16965 observations. …
Running a *.do file automatically when Stata opensHi, Can Stata run a *.do file automatically as soon as it opens? Trying to run an automatic message…
is there package/codes for Spatial panel vector auto-regressive (VAR) model OR Spatial panel vector error correction model (VECM) in stata?I am looking for codes/Package available for Spatial panel VAR model or Spatial panel VECM model in …
How to perform PSM without replacement using teffects psmach commandDear all, I want to perform 1:1 nearest neighbourhood PSM wihtout replacement, and get AI variance. …
Subscribe to:
Post Comments (Atom)
0 Response to Help matching city names in two data sets
Post a Comment