I am wondering if anyone has seen any kind of examination of the various matching methods available in the -matchit- function? I don't really understand the difference between bigram, ngram, ngram_circ, token, soundex and token_soundex. Also where do the different scoring options (jaccard, simple, and minisimple) excel?
Specifically, I have a dataset where I'm trying to match off of business name and address. Does one of the options work better at ignoring small (in my mind anyway) differences such as "LLC" vs. "Inc" vs. no modifier, such as often happens when business names are recorded? Does another one of these options give greater weight to differences in numbers, such as I'm seeing in addresses? Ex: "123 Main Street." should not be matched with "321 Main Street.", but should be matched with both "123 Main" and 123 Main St". Can someone, either by experience of reference, tell me where I might get the best results from? Is this a question that Julio Raffo has written about before? If so I can't seem to find it.
Thanks for any help you can provide.
Related Posts with Examining -Matchit- options for improving matches based on types of string variables
Can I define the cell contents of a table (or matrix)?New here, new to Stata (v17) I'm trying to create a table to later be exported to Excel. I would li…
Cox regression hazard ratio interpretation----------------------- Code: * Example generated by -dataex-. For more info, type help dataex clea…
How to standardize Stata variables format before using -append- ?I have around 40 files that I would like to append together. However, two variables named a12 and a1…
Reshape wide with missing observationsHello, I am trying to reshape wide the dataset below. Basically i would like the months (January th…
calculation odd ratio for interaction terms using LincomI am doing logistic regression to look at several predictors for a certain outcome. I checked the in…
Subscribe to:
Post Comments (Atom)
0 Response to Examining -Matchit- options for improving matches based on types of string variables
Post a Comment