I want to identify when two putatively matched surnames (stored as two string variables) differ only by the switching of two consecutive letters. For example, to flag someone who has "CARSLAKE" in String1 and "CARLSAKE" in String2. I don't want to accept other pairs with a Levenshtein distance of 2 as they look more like distinct names (not typos).
I can imagine something looping through each letter in turn using substr(), but this would be very long-winded and clunky since the surnames of course vary in length between pairs (within pairs, I'm only interested if they're the same length). Does anyone know of a more sensible solution? Thanks.
Related Posts with Identifying pairs of strings that differ by only two consecutive, inverted characters
Help needed Pearsons R table for Categorical Independent variablesHi All, I am trying to create a Pearson's correlation table in Stata. The experiment looks at Indep…
Need help with constraint dropped when testing for equality coefficients and joint significance of two variablesHi all, Currently, I'm doing a multiple regression, with the totalcases as the dependent variable, …
Line graph (mean) with panel dataHey Stata community, I tried to plot something using a panel data set and the "line" command (sales…
How to convert date variable to year group?I am given a list of date variables in dd/mm/yy and I would like to sort the variables out in the fo…
Reshaping and keeping labelsHello! As context, I'm estimating several regressions with around 900 parameters. I need to save aro…
Subscribe to:
Post Comments (Atom)
0 Response to Identifying pairs of strings that differ by only two consecutive, inverted characters
Post a Comment