fuzzy match

I am new to using the matchit command and finding it challenging to understand what the different options mean and which one would be most suitable for my needs.

Dataset 1

SOME_KIND_OF_NAME

THESQUIRL WAS YELLOW AND SMOOTH

THESQUIRL WAS YELLOW SUNSHINE AND SMOOTH

THESQUIRLWASPURPLE

BLUE MUFFINS ARE-AWESOME

BLUE-RAY MUFFINS ARE

Dataset 2 –look up table

COLORS

GREEN

PURPLE

YELLOW SUNSHINE

BLUE-RAY

The code I am using is the following for example:

use "DIRECTORY-dataset1 ", clear
matchit SAMPLE_ID SOME_KIND_OF_NAME using "directory-dataset2 ", idu(ID) txtu(colors) sim(token) t(0)

MATCH

THESQUIRL WAS YELLOW AND SMOOTH	YELLOW SUNSHINE > wrong (I only want it to match if it contains exactly YELLOW SUNSHINE, the words together in the long string)
THESQUIRL WAS YELLOW SUNSHINE AND SMOOTH	YELLOW SUNSHINE
THESQUIRLWASPURPLE	PURPLE
BLUE MUFFINS ARE-AWESOME	BLUE-RAY >wrong (I only want it to match if it contains exactly BLUE-RAY, the words together in the long string)
BLUE-RAY MUFFINS ARE	BLUE-RAY

I am not sure if for this example it would be helpful if I created dummy variables in the proper stata dataex. If so, let me know and I can try to ask my question in a different way with actual data.

Thank you!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / fuzzy match
fuzzy match

0 Response to fuzzy match

Post a Comment

Home / Data Cleaning / Data management / Data Processing / fuzzy match fuzzy match

Related Posts with fuzzy match

0 Response to fuzzy match

Post a Comment

Home / Data Cleaning / Data management / Data Processing / fuzzy match
fuzzy match