I want to identify when two putatively matched surnames (stored as two string variables) differ only by the switching of two consecutive letters. For example, to flag someone who has "CARSLAKE" in String1 and "CARLSAKE" in String2. I don't want to accept other pairs with a Levenshtein distance of 2 as they look more like distinct names (not typos).
I can imagine something looping through each letter in turn using substr(), but this would be very long-winded and clunky since the surnames of course vary in length between pairs (within pairs, I'm only interested if they're the same length). Does anyone know of a more sensible solution? Thanks.
Related Posts with Identifying pairs of strings that differ by only two consecutive, inverted characters
Estimations with XTDPDGMM commandHi there, I have estimated a dynamic linear panel regression with new command "xtdpdgmm" following …
Zero Iteration when using nl regression !Dear Stata users, In order to estimate the following logistic equation using nonlinear regression (…
Use of -simulate- command for cost benefit analysisHello, I would like to learn about how to use -simulate- command to estimate impact of different pol…
Mixed model coefficient?Hello, Beginner question. How do you interpret the coefficients of a mixed model (Multilevel mixed…
How to determine outliers in accounting data using stata?Hello how can I determine the outliers for this variable? TOTAL DEBT ------------------------------…
Subscribe to:
Post Comments (Atom)
0 Response to Identifying pairs of strings that differ by only two consecutive, inverted characters
Post a Comment