Hi,

I have a multiple response string variable, locnm_1, which lists the names of local sponsors of about 1,000 protest events. I am trying to determine the number of protest events each local sponsor part of. I'm trying to generate an indicator variable for each local sponsor.

Some events have only one local sponsor, but some have up to 5. Furthermore, the names of the local sponsor groups are not standardized. For instance, (see the example data below), the "santa rosa county patriots" and the "orange county patriots" should just be identified as "patriots."

Data Example:
Code:
clear
input str97 locnm_1
"conservatives of washington county"                                                    
"taxpayers organization of minnesota"                                                                   
"411 project, orange county patriots"                                                                                   
"taxpayers organization of washington"                                                                   
"grassroot institute of the south, southern republican assembly, southwestern mutual network"
"top conservatives on twitter, donations movement"                              
"santa clara county patriots"                                                                      
"taxpayers organization of minnesota"                                                                   
"nevada action group"                                                                         
"grassroot institute of the south, southern republican assembly, southwestern mutual network"
end

I know that this would be quite easy if locnm_1 was a single response string variable with no unstandardized names issues. I would just use
Code:
tab varname, gen(stub)
to create indicator variables for every level of the string variable. However, since it is a multiple response string variable, and some of the responses need to be standardized, I do not know how to proceed. Any help is appreciated. I'd rather not do it manually in Excel...

I'm using Stata 16.1.

Thank you,
Kerby