Hello everyone!
I'm working with data (string) that works with medical diagnoses. I have a variable "Description" that contains a phrase (ex. - "Malignant neoplasm of peripheral nerves of abdomen" or "Intraductal carcinoma in situ of left breast".
I need to identify all patients with "Cancer" in the data. This can be specified by different terms within these phrases, as "malignant", or "neoplasm", or "carcinoma".
I would like to create a new dichotomous variable called "cancer" and replace cancer=1 if the variable "Description" contains any of these "buzz words". I have approximately 6-8 of these buzz words that would identify a patient as having cancer.
I came up with:
replace cancer=1 if regexm(Description, `"carcinoma"')
This seems to work. Is this correct? Is there a way to add an "or" command to accomplish the command for multiple "buzz words"? - For example running that command for both "neoplasm" and "malignant"?
I would appreciate any help! Otherwise, I will manually have to review thousands of entries.
Thanks!
Related Posts with Identify specific words within String Variables
heteroskedasticity in logistic regression modelHi, I have cross sectional data and am using logistic regression. My question is how do I check my …
Postestimation test for cross-sectional time series FGLS regressionHi I'm conducting a study on the determinants of bank profitability in my country. I have data from…
change the color of bars -BOXPLOTHi to everybody, I have done for the first time "Box plot by group with data point" By default, Sta…
Generate weighted median variable by other variablesHello, I have a dataset which lists the year, state, age group, income, and survey weight of indivi…
Different y-axis range on xtline plotsI'm trying to construct some xtline plots using 10 different ids. The range of values of y for id #s…
Subscribe to:
Post Comments (Atom)
0 Response to Identify specific words within String Variables
Post a Comment