Dear all,
I have a variable called place_birth in my dataset. Some of the locations weren't recorded properly.
place_birth
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
ST B(?), canton de Chaillot
(?Chanvrand)Canton de La Guiche
Seine-Inférieure (Seine-Maritime)
Épinay-sur-Seine ,
Autine (?) Outines
Darrois ? Darvois
I would like to do two things.
First, separate what is inside parenthesis () and comma , and = from the text. With what I separate I can create an new variable called place_new
Second, clean both variable from weird signs like ?, =, . at the end, /, etc...
For example
Épinay-sur-Seine ,
should look like
Épinay-sur-Seine
replace ? and (?) with a comma
Autine (?) Outines
it becomes
Autine , Outines
For this one:
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
Eliminate "Saint-James =" and just leave:
Feucherolles (Le château royal de Sainte-Gemme)
Then I can separate the strings by comma and parenthesis so that for example:
place_birth
(?Chanvrand)Canton de La Guiche
becomes:
place_new
Chanvrand
Or:
place_birth
Seine-Inférieure (Seine-Maritime)
Becomes in the new var:
place_new
Seine-Maritime
Related Posts with Cleaning string variable
StanStandard error for wald test …
Suppress legend on histogramHello all, I would like to create a histogram of a 4-level categorical variable (hh_comp) that descr…
How to get a list of selected variable names from lassocoef?Hi all, I'm using the lasso commands to select variables from Stata 16. Here is my example code: C…
Looking at Heteroskedastickty with Random EffectsHello, I was wondering how i can graphically inspect my residual distribution and see whether a hete…
CADF testHello everyone, Country:18 Year:17 I'm using STATA for two weeks. I want to achieve the CADF unit…
Subscribe to:
Post Comments (Atom)
0 Response to Cleaning string variable
Post a Comment