I want to identify when two putatively matched surnames (stored as two string variables) differ only by the switching of two consecutive letters. For example, to flag someone who has "CARSLAKE" in String1 and "CARLSAKE" in String2. I don't want to accept other pairs with a Levenshtein distance of 2 as they look more like distinct names (not typos).
I can imagine something looping through each letter in turn using substr(), but this would be very long-winded and clunky since the surnames of course vary in length between pairs (within pairs, I'm only interested if they're the same length). Does anyone know of a more sensible solution? Thanks.
Related Posts with Identifying pairs of strings that differ by only two consecutive, inverted characters
How to calculate the monthly average from daily data?I have daily data from 03-01-1994 to 29-12-1995. The first column is date and the second column is m…
Help ppml Stata 15Hi, I am trying to run this command on Stata 15: ppml trade EXPORTER_TIME_FE* IMPORTER_TIME_FE* ln…
graph bar with several categorical variableshi stata users i would like to do this graph in stata; it's about different services and with a lik…
How to calculate the monthly average of the multiple columns using loopsDear Stata-users, I have daily data from 03-01-1994 to 29-12-1995. In my datafile, the date column …
Collapse using both string and alphanumericI am trying to collapse three (ultimately more) variables on zip codes, state names, and Congression…
Subscribe to:
Post Comments (Atom)
0 Response to Identifying pairs of strings that differ by only two consecutive, inverted characters
Post a Comment