Dear all,
I have a variable called place_birth in my dataset. Some of the locations weren't recorded properly.
place_birth
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
ST B(?), canton de Chaillot
(?Chanvrand)Canton de La Guiche
Seine-Inférieure (Seine-Maritime)
Épinay-sur-Seine ,
Autine (?) Outines
Darrois ? Darvois
I would like to do two things.
First, separate what is inside parenthesis () and comma , and = from the text. With what I separate I can create an new variable called place_new
Second, clean both variable from weird signs like ?, =, . at the end, /, etc...
For example
Épinay-sur-Seine ,
should look like
Épinay-sur-Seine
replace ? and (?) with a comma
Autine (?) Outines
it becomes
Autine , Outines
For this one:
Feucherolles (Saint-James = Le château royal de Sainte-Gemme)
Eliminate "Saint-James =" and just leave:
Feucherolles (Le château royal de Sainte-Gemme)
Then I can separate the strings by comma and parenthesis so that for example:
place_birth
(?Chanvrand)Canton de La Guiche
becomes:
place_new
Chanvrand
Or:
place_birth
Seine-Inférieure (Seine-Maritime)
Becomes in the new var:
place_new
Seine-Maritime
Related Posts with Cleaning string variable
percentage change of lagged variableI have another related query on generating lagged variables. My regression model looks like this: In…
How to convert strings to date variable?For example, a date of 81.511, corresponds to 06/07/1981(DMY). How do you convert this to a Stata da…
Creating an adjacency matrix from two columnsDear Statalists, I have 2 columns (id and course) I would like to create two types of adjacency ma…
Every time I type the command 'mmsel', I get a 'file tmp / xfbf.dta not found' error.I try to implement wage decomposition based on Machado and Mata (2005). I would be grateful if you c…
Factor variables vs. dummy variables with interactionsHi all, I have a dummy variable x1 (no missing values) in the dataset and has values of 0 and 1, an…
Subscribe to:
Post Comments (Atom)
0 Response to Cleaning string variable
Post a Comment