I am wondering if anyone has seen any kind of examination of the various matching methods available in the -matchit- function? I don't really understand the difference between bigram, ngram, ngram_circ, token, soundex and token_soundex. Also where do the different scoring options (jaccard, simple, and minisimple) excel?

Specifically, I have a dataset where I'm trying to match off of business name and address. Does one of the options work better at ignoring small (in my mind anyway) differences such as "LLC" vs. "Inc" vs. no modifier, such as often happens when business names are recorded? Does another one of these options give greater weight to differences in numbers, such as I'm seeing in addresses? Ex: "123 Main Street." should not be matched with "321 Main Street.", but should be matched with both "123 Main" and 123 Main St". Can someone, either by experience of reference, tell me where I might get the best results from? Is this a question that Julio Raffo has written about before? If so I can't seem to find it.

Thanks for any help you can provide.