How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)

Excuse my terrible title, I struggled to communicate what I wanted to do in a concise way.

I am working on a database of federal contractors and I'm trying to create dummy variables for the different business types an entity can have. The problem I am facing is that in this dataset a contractor can have multiple business types which are all contained in one string variable. For example:

contractor	Busn_type_str
A	23~2X~PI
B	1D~23~27~A5~A8~H2~HK~PI~QF

Each of these two-digit alphanumerics represents a different business type. I want to create a dummy variable for each of the business types so that I can make some tables for the project I am working on. I could go one by one and generate dummy = regexm(Busn_type_str, {code}), but there are 78 of these codes. I will do this if there is no other alternative but I would rather work smart not hard. Any suggestions on how to generate these dummy variables efficiently?

In summary: I have a string of 2 digit codes (78 different codes) that I want to use to make 78 dummy variables indicating the presence of individual codes. I am trying to avoid going one by one and generating them using regexm. I am considering using a for loop combined with regexm but I want to see if there are any other methods out there that might save me some time.

Thanks for all your help!

-Enrique A Figueroa

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)
How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)

0 Response to How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)

Post a Comment

Home / Data Cleaning / Data management / Data Processing / How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm) How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)

Related Posts with How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)

0 Response to How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)

Post a Comment

Home / Data Cleaning / Data management / Data Processing / How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)
How can I efficiently generate many dummy variables from substrings? (other than one by one using regexm)