Dataset 1
| SOME_KIND_OF_NAME |
| THESQUIRL WAS YELLOW AND SMOOTH |
| THESQUIRL WAS YELLOW SUNSHINE AND SMOOTH |
| THESQUIRLWASPURPLE |
| BLUE MUFFINS ARE-AWESOME |
| BLUE-RAY MUFFINS ARE |
Dataset 2 –look up table
| COLORS |
| GREEN |
| PURPLE |
| YELLOW SUNSHINE |
| BLUE-RAY |
use "DIRECTORY-dataset1 ", clear
matchit SAMPLE_ID SOME_KIND_OF_NAME using "directory-dataset2 ", idu(ID) txtu(colors) sim(token) t(0)
MATCH
| THESQUIRL WAS YELLOW AND SMOOTH | YELLOW SUNSHINE > wrong (I only want it to match if it contains exactly YELLOW SUNSHINE, the words together in the long string) |
| THESQUIRL WAS YELLOW SUNSHINE AND SMOOTH | YELLOW SUNSHINE |
| THESQUIRLWASPURPLE | PURPLE |
| BLUE MUFFINS ARE-AWESOME | BLUE-RAY >wrong (I only want it to match if it contains exactly BLUE-RAY, the words together in the long string) |
| BLUE-RAY MUFFINS ARE | BLUE-RAY |
Thank you!
0 Response to fuzzy match
Post a Comment