BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

checking string similarity within the same variable
checking string similarity within the same variable

Dear all,

This message was initially posted in the discussion thread

HTML Code:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1307980-matchit-command-to-match-two-datasets-based-on-similar-text-pattern,

, but was advised to post as a new post, with a title better matching my question, so here we go!

In most of the string similarity discussions on Statalist, users are trying to find similarities between variables. I however, would like to get a similarity score for observations within the same string variable. My data set contains more than 10000 person records and most likely there will be hundreds of people that occur in the data set multiple times, but with slightly different spelled names.

Do you have any experience with checking for string similarity within the same variable and may I ask what package you decided using in the end?

Thank you for sharing your experience!

Best wishes,

Moniek

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / checking string similarity within the same variable
checking string similarity within the same variable

0 Response to checking string similarity within the same variable

Post a Comment

Home / Data Cleaning / Data management / Data Processing / checking string similarity within the same variable checking string similarity within the same variable

0 Response to checking string similarity within the same variable