I want to identify when two putatively matched surnames (stored as two string variables) differ only by the switching of two consecutive letters. For example, to flag someone who has "CARSLAKE" in String1 and "CARLSAKE" in String2. I don't want to accept other pairs with a Levenshtein distance of 2 as they look more like distinct names (not typos).
I can imagine something looping through each letter in turn using substr(), but this would be very long-winded and clunky since the surnames of course vary in length between pairs (within pairs, I'm only interested if they're the same length). Does anyone know of a more sensible solution? Thanks.
Related Posts with Identifying pairs of strings that differ by only two consecutive, inverted characters
How to modify individual graph's title on combined graphDear Stata users, Suppose we have a combined graph that is composed of two individual graphs, let's …
Merge based on time-range conditionHello everyone, I have 2 datasets that needed to be merged. Dataset A has unique IDs with multiple …
Creating*copies with missing valuesHi Stata users, I have a dataset as shown in the example below Code: * Example generated by -data…
2 sequential loops, second loop only runs those instances that converged in the firstHello all, I'm running several instances of a maximum likelihood routines in a first set of loops. …
create a dummy variable if variables in two groups matchHello stata users, I would like to create a dummy variable if any variables in two different groups…
Subscribe to:
Post Comments (Atom)
0 Response to Identifying pairs of strings that differ by only two consecutive, inverted characters
Post a Comment