Dear Statalist,
I am in the process of cleaning a dataset, in particular standardizing a string variable. The string variable (employer) contains names of three employers, separated by commas (e.g. "google,mckinsey,bain"). Among my list of employers is also an "un" standing for "united nations". Normally, I would use the following command to change "un" to "united nations":
replace employer = subinstr(employer,"un", "united nations", .)
In this case, this doesn't work since "un" is not unique. For example, I have also "unilever" in my dataset which would result in something like "united nationsilever".
My feeling is that either a regular expression or an "if" command is the solution to my problem. My plan is to tell Stata that it should change "un" to "united nations" only if it comes as a stand-alone word. Again the string variable looks like this: "world bank,un,european union" or "un,tesla,apple". I already standardized it to this lower case/no spaces/comma structure.
So, which command do I need?
Thanks,
Daniel
Related Posts with String Variable - Subinstr? Regex?
Stata- margins and rifreg commandHi, We have a problem with margins after rifreg command: it works without writting xi in front of ri…
Rosenbaum BoundsHi, I have conducted propensity score matching using the WHO-5 Mental Health Index as my outcome va…
Generating a New Variable based on Conditional IF StatementsHello STATA Experts: I am trying to create a new variable based on the existence of certain conditio…
Does data on a time series need to be balanced like panel data?Does data on a time series need to be balanced like panel data? …
Assigning value "1" to a dummy variable when another variable reaches its max, for each individual across choice alternativesI have a dataset of about 3800 observations. This dataset is in long form: I have about 760 individu…
Subscribe to:
Post Comments (Atom)
0 Response to String Variable - Subinstr? Regex?
Post a Comment