I want to identify when two putatively matched surnames (stored as two string variables) differ only by the switching of two consecutive letters. For example, to flag someone who has "CARSLAKE" in String1 and "CARLSAKE" in String2. I don't want to accept other pairs with a Levenshtein distance of 2 as they look more like distinct names (not typos).
I can imagine something looping through each letter in turn using substr(), but this would be very long-winded and clunky since the surnames of course vary in length between pairs (within pairs, I'm only interested if they're the same length). Does anyone know of a more sensible solution? Thanks.
Related Posts with Identifying pairs of strings that differ by only two consecutive, inverted characters
Distribution Dependent variableHello, does someone know how to determine the distribution of the dependent variable? I am not sure…
Reshape long WDI(World Bank) dataHi,Statalist. I want to reshape long the data of WDI to panel data.This is the example data. Code: …
Comparing classes in a Latent Profile ModelDear all, I am trying to determine the number of groups or classes for a latent variable using seve…
many-to-many merge problemHi, I have the following data that I want to merge: Master dataset: statalist_data4.dta Code: * Ex…
STATA putdocxHi everyone, I am trying to create a report with putdocx. I have managed to put percentages in a tab…
Subscribe to:
Post Comments (Atom)
0 Response to Identifying pairs of strings that differ by only two consecutive, inverted characters
Post a Comment