I have what I think is a relatively straightforward question, but I have been unable to use any existing posts to deal with my problem.
I have a large batch of questions/requests made by MPs in parliamentary sessions. I have trimmed and stored the questions as strings in Stata, in something like the following format (made up examples):
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str140 question "The country's stance on the status of the xyz island chain" "The overall budget for emergency preparedness in 2018" "The total cost for the economic bailout in the Northern states" "Unspent funds from last fiscal year related to welfare assistance" end
What I would like to do is generate new variables that can help categorize these different questions. In the example, three of the questions have something to do with the government budget, but use different words (budget, cost, funds). I would like to have a single variable, budget, that can record whether any of those words is mentioned in the question.
My current approach has been to generate individual dummy variables: e.g. gen budget = strpos(question, "budget") > 0 . But this is rather cumbersome.
Is there a way to compare a string against a number of keywords and generate/replace a binary variable if one or more of those keywords are matched?
I am hoping to generate an output as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str140 question float budget "The country's stance on the status of the xyz island chain" 0 "The overall budget for emergency preparedness in 2018" 1 "The total cost for the economic bailout in the Northern states" 1 "Unspent funds from last fiscal year related to welfare assistance" 1 end
Any help would be greatly appreciated!
Thanks.
-Nate
0 Response to Generating binary variable by comparing string against list of keywords
Post a Comment