Hello everyone!

I'm working with data (string) that works with medical diagnoses. I have a variable "Description" that contains a phrase (ex. - "Malignant neoplasm of peripheral nerves of abdomen" or "Intraductal carcinoma in situ of left breast".


I need to identify all patients with "Cancer" in the data. This can be specified by different terms within these phrases, as "malignant", or "neoplasm", or "carcinoma".

I would like to create a new dichotomous variable called "cancer" and replace cancer=1 if the variable "Description" contains any of these "buzz words". I have approximately 6-8 of these buzz words that would identify a patient as having cancer.

I came up with:

replace cancer=1 if regexm(Description, `"carcinoma"')


This seems to work. Is this correct? Is there a way to add an "or" command to accomplish the command for multiple "buzz words"? - For example running that command for both "neoplasm" and "malignant"?

I would appreciate any help! Otherwise, I will manually have to review thousands of entries.

Thanks!