Dear Statalist,
I am in the process of cleaning a dataset, in particular standardizing a string variable. The string variable (employer) contains names of three employers, separated by commas (e.g. "google,mckinsey,bain"). Among my list of employers is also an "un" standing for "united nations". Normally, I would use the following command to change "un" to "united nations":
replace employer = subinstr(employer,"un", "united nations", .)
In this case, this doesn't work since "un" is not unique. For example, I have also "unilever" in my dataset which would result in something like "united nationsilever".
My feeling is that either a regular expression or an "if" command is the solution to my problem. My plan is to tell Stata that it should change "un" to "united nations" only if it comes as a stand-alone word. Again the string variable looks like this: "world bank,un,european union" or "un,tesla,apple". I already standardized it to this lower case/no spaces/comma structure.
So, which command do I need?
Thanks,
Daniel
Related Posts with String Variable - Subinstr? Regex?
Correlated Random Effects Goodness of FitDear all, I have a panel with 345 observations and six variables. My cross-sectional variable is Pan…
How to do sensitivity analysis?Dear, I am new in using stata. I did a metaprop analysis using the command below metaprop Mort N, …
Creating a group ID based on multiple variables: no bysort & no weights allowedHello Statalist, I am attempting to create a group ID which identifies each unique combination of t…
LincomHello All, I have a panel dataset of 15 countries and I have interacted a variable of governance wit…
a better method of generating a variable?I have a education variable, q15_educ, in my data set and I need to create a new education variable,…
Subscribe to:
Post Comments (Atom)
0 Response to String Variable - Subinstr? Regex?
Post a Comment