I am trying to match data on cities in two data sets using only the names of the cities. Unfortunately, there are many variants for how names can be presented.
For example,
in one data set the name of a city is TERREBONNE PARISH CONSOLIDATED GOVERNMENT but
in the other data set the name of the city is TERREBONNE CONSOLIDATED GOVERNMENT.
These are almost certainly the same city but when I use the matchit function they do not give me a perfect match.
I cannot just use matchit's similarity scores because there are also cases in the data like APPLEGATE VILLAGE which is matched to THE VILLAGE OF DOUGLAS. These two are almost certainly not the same city but I get a matchit score of .5 because they both contain the word VILLAGE
I thought about creating a variable for each word in the city name and then dropping words like "Village", "Town" "of" etc. but even if I programmed that in I can't figure out any reasonablely efficient strategy to find that "TERREBONNE" is in each name since the word order can differ across data sets.
Suggestions for how to proceed would be appreciated.
Related Posts with Help matching city names in two data sets
Unexpected error _lambda invalid nameHi everyone, I am having an unexpected error warning "_lambda_varname invalid name". I am using ST…
sensitivity analysis for robustness of results over the study periodGreetings, I am conducting a case-control study to assess the effect of ethnicity (six main ethnic g…
panel data: exploring factors affecting incidence/persistence/remission of a condition over timeDear statalisters, I have some panel data that include four binary clinical outcomes (0=absent, 1=p…
Panel data estimation with differenced variablesHello Everyone, So after testing my var. with xtcips, several of my variables are not stationary at …
Daily Stock returnsDear all, I am entirely new to stata, so in case I am leaving out any information that might be req…
Subscribe to:
Post Comments (Atom)
0 Response to Help matching city names in two data sets
Post a Comment