String Variable - Subinstr? Regex? | BJ Data Tech Solution

Dear Statalist,
I am in the process of cleaning a dataset, in particular standardizing a string variable. The string variable (employer) contains names of three employers, separated by commas (e.g. "google,mckinsey,bain"). Among my list of employers is also an "un" standing for "united nations". Normally, I would use the following command to change "un" to "united nations":

replace employer = subinstr(employer,"un", "united nations", .)

In this case, this doesn't work since "un" is not unique. For example, I have also "unilever" in my dataset which would result in something like "united nationsilever".

My feeling is that either a regular expression or an "if" command is the solution to my problem. My plan is to tell Stata that it should change "un" to "united nations" only if it comes as a stand-alone word. Again the string variable looks like this: "world bank,un,european union" or "un,tesla,apple". I already standardized it to this lower case/no spaces/comma structure.

So, which command do I need?

Thanks,
Daniel

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / String Variable - Subinstr? Regex?
String Variable - Subinstr? Regex?

0 Response to String Variable - Subinstr? Regex?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / String Variable - Subinstr? Regex? String Variable - Subinstr? Regex?

Related Posts with String Variable - Subinstr? Regex?

0 Response to String Variable - Subinstr? Regex?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / String Variable - Subinstr? Regex?
String Variable - Subinstr? Regex?