Hi. I have two datasets with the variable “Product”, the name of different pharma drugs. I am trying to merge the two datasets by name. However, the names are not standardized as shown in the example below. Once merged, I want to identify observations that have parts of the name overlapping which, on merging, are in the “master only (1)” or the “using only (2)”. In the example below, this would identify observations 3 to 7. I don’t have much familiarity working with strings and would appreciate any guidance. Thanks.


Code:
input str50 Product byte _merge
"A&D" 1
"A/B OTIC" 1
"ALLERX" 2
"ALLERX (AM/PM DOSE PACK 30)" 1
"ALLERX (AM/PM DOSE PACK)" 1
"ALLERX DF" 2
"ALLERX PE" 1
"ABILIFY" 1
"ACARBOSE" 1