I want to categorize a variable based on a list that I have maintained. This list is continuously updated and have more than 1000 unique clean and categorized businesses.
Now, I want to loop through each observation of my variable in Stata dataset and see if any words within the clean list is present. If so, that observation would be categorized according to the listed categories. And if not, move to other observation of that variable.
Manually doing this process over and over is time consuming. So, I need to find a way to code this in Stata.
Code:
merge command won't work here since, it match exact words and not sub-strings within variables
I tried the following but it is time consuming and tedious manual process
Code:
foreach i in Bike Rickshaw Van { replace category= "Transport" if regexm(business,"`i'",.) }
Following is a snapshot of the list and data (variables):
* clean list
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str14 word str11 category "Rickshaw" "Transport" "Bike" "Transport" "Stitching" "Enterprise" "Livestock" "Live Stock" "trading" "Enterprise" "servicees" "Enterprise" "housing" "Enterprise" "milk selling" "Agriculture" "vegetable sale" "Agriculture" "vegetable shop" "Agriculture" end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str18 business byte category "Live Stock" . "Trading & Business" . "Handi Craft" . "Others" . "Agriculture" . "Manufacturing" . "Commerce" . "Commucation System" . "Stitching Work" . "Transport" . "Shoe Business" . "Auto part Workshop" . "Education" . "Cloth selling" . "Animal Trading" . end
Thanks in advance.
0 Response to categorize a variable if certain words are found in it from a big list
Post a Comment