Dear Statalist,
I am in the process of cleaning a dataset, in particular standardizing a string variable. The string variable (employer) contains names of three employers, separated by commas (e.g. "google,mckinsey,bain"). Among my list of employers is also an "un" standing for "united nations". Normally, I would use the following command to change "un" to "united nations":
replace employer = subinstr(employer,"un", "united nations", .)
In this case, this doesn't work since "un" is not unique. For example, I have also "unilever" in my dataset which would result in something like "united nationsilever".
My feeling is that either a regular expression or an "if" command is the solution to my problem. My plan is to tell Stata that it should change "un" to "united nations" only if it comes as a stand-alone word. Again the string variable looks like this: "world bank,un,european union" or "un,tesla,apple". I already standardized it to this lower case/no spaces/comma structure.
So, which command do I need?
Thanks,
Daniel
Related Posts with String Variable - Subinstr? Regex?
Can npregress be used to predict mean value of y at given xIf I split the data into training and test group, can I use npregress to predict the mean value in t…
Check whether coefficients are significant using 2 SE bandsHello, I am reading a paper and it says that the following figure (which draws 2 SE bands) is used …
how to include an if qualifier in an egen command mean() functionI'm working with a time-series panel data set and am attempting to calculate a cross sectional mean …
Pooled OLS, Random Effects and Fixed Effects all giving exact same results!Hi there, I am using a panel data set to investigate the link between inequality and growth. But I…
Does an event studies analysis require an OLS model?Hi! I am writing to ask whether an event studies analysis can only happen with a linear regression m…
Subscribe to:
Post Comments (Atom)
0 Response to String Variable - Subinstr? Regex?
Post a Comment