Hello everyone!
I'm working with data (string) that works with medical diagnoses. I have a variable "Description" that contains a phrase (ex. - "Malignant neoplasm of peripheral nerves of abdomen" or "Intraductal carcinoma in situ of left breast".
I need to identify all patients with "Cancer" in the data. This can be specified by different terms within these phrases, as "malignant", or "neoplasm", or "carcinoma".
I would like to create a new dichotomous variable called "cancer" and replace cancer=1 if the variable "Description" contains any of these "buzz words". I have approximately 6-8 of these buzz words that would identify a patient as having cancer.
I came up with:
replace cancer=1 if regexm(Description, `"carcinoma"')
This seems to work. Is this correct? Is there a way to add an "or" command to accomplish the command for multiple "buzz words"? - For example running that command for both "neoplasm" and "malignant"?
I would appreciate any help! Otherwise, I will manually have to review thousands of entries.
Thanks!
Related Posts with Identify specific words within String Variables
-meprobit- vs -probit- with clustered standard errorsHi all, So I have a multi-level data, with individuals from different countries, over the period of …
Replacing observations with charachtersI am using the following command: gen Cou_Code_O=. in 1/26 replace Cou_Code_O = "ARG" in 1 and …
Which stata command can I use to conduct a randomization check?Hi all. As part of my dissertation, I conducted a between-subjects factorial experiment with two fac…
NBReg - How to calculate difference between two incidence ratesHi, I want to know if it's possible to calculate the difference in two incidence rates from a negat…
Help with the creation of a lagged standard deviation variableHello all, I am hoping for a bit of help with the generation of a new variable. I have some experie…
Subscribe to:
Post Comments (Atom)
0 Response to Identify specific words within String Variables
Post a Comment