I am wondering if anyone has seen any kind of examination of the various matching methods available in the -matchit- function? I don't really understand the difference between bigram, ngram, ngram_circ, token, soundex and token_soundex. Also where do the different scoring options (jaccard, simple, and minisimple) excel?
Specifically, I have a dataset where I'm trying to match off of business name and address. Does one of the options work better at ignoring small (in my mind anyway) differences such as "LLC" vs. "Inc" vs. no modifier, such as often happens when business names are recorded? Does another one of these options give greater weight to differences in numbers, such as I'm seeing in addresses? Ex: "123 Main Street." should not be matched with "321 Main Street.", but should be matched with both "123 Main" and 123 Main St". Can someone, either by experience of reference, tell me where I might get the best results from? Is this a question that Julio Raffo has written about before? If so I can't seem to find it.
Thanks for any help you can provide.
Related Posts with Examining -Matchit- options for improving matches based on types of string variables
Is this the correct Stata syntax for CF/2SRI with an ordinal probit in the first stage?Dear All, I am mostly an R user. But since I have not found a way yet to reproduce generalised resi…
Creating a treatment for DID analysis: how to do it the right?Hi all. I have panel data containing the number of Brazilian institutions of higher education (varia…
tabulate multiple variables with percentageHi, I need make table of age, sex, salary, hh_icnome by occupation in one table with only % figures…
Individual line colours in pcspikeHello, I'm trying to create a parallel axis dot plot where each line has a different colour (I've m…
Line Graph in Panel DatasetHi! I wanted to how to create a single line graph for a panel dataset? The dataset looks something l…
Subscribe to:
Post Comments (Atom)
0 Response to Examining -Matchit- options for improving matches based on types of string variables
Post a Comment