Dear Statalist,

To facilitate name matching, I would like to remove certain regular expressions from a string variable. For example, I would like to remove the regular expression "*pharm*" from the string, which should capture words like "biopharmaceutical" or "pharma". Here is an example where the "name" column is the original data and "clean_name" is the desired result. I am aware of substring and regexm, but so far my attempts have not been fruitful.

Code:
clear
input str20 name str7 clean_name
"Fantasy Biopharma"    "Fantasy"
"New Biotech"          "New"    
"ABC Pharmaceuticals"  "ABC"    
"Biotechnolgy General" "General"
"My Tech"              "My Tech"
"Pharma World"         "World"  
end
I would love to hear from the community!

Best,

Marvin