I am trying to match data on cities in two data sets using only the names of the cities. Unfortunately, there are many variants for how names can be presented.

For example,
in one data set the name of a city is TERREBONNE PARISH CONSOLIDATED GOVERNMENT but
in the other data set the name of the city is TERREBONNE CONSOLIDATED GOVERNMENT.

These are almost certainly the same city but when I use the matchit function they do not give me a perfect match.

I cannot just use matchit's similarity scores because there are also cases in the data like APPLEGATE VILLAGE which is matched to THE VILLAGE OF DOUGLAS. These two are almost certainly not the same city but I get a matchit score of .5 because they both contain the word VILLAGE


I thought about creating a variable for each word in the city name and then dropping words like "Village", "Town" "of" etc. but even if I programmed that in I can't figure out any reasonablely efficient strategy to find that "TERREBONNE" is in each name since the word order can differ across data sets.

Suggestions for how to proceed would be appreciated.