Hi Statalist.
I am trying to use regexs and regexm to extra information from strings. A sample is below. The strings contain location on town names, which always comes first, and firm names, which always comes second. The town and firm names are usually separated by a comma, but could be separated by any punctuation character. Infrequently punctuation appears before the town name. Only part of the town name is extracted by the command that I wrote:
gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
Please let me know if you have an idea about how I could extract the entire town name.
Sample data, code, and output appear below.
Thanks
Gary
input str60 LocationFirm
"Albertville,First*................"
"Albertville,Albertville-"
"Anniston.Anniston—--"
"Anniston;Commercial-."
"Anniston^Blender-."
"Decatur,MorganCounty"
"..Decatur,Jupiter"
end
gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
list city
. clear
. do "C:\Users\garyr\AppData\Local\Temp\STD462c_000000. tmp"
. input str60 LocationFirm
LocationFirm
1. "Albertville,First*................"
2. "Albertville,Albertville-"
3. "Anniston.Anniston—--"
4. "Anniston;Commercial-."
5. "Anniston^Blender-."
6. "Decatur,MorganCounty"
7. "..Decatur,Jupiter"
8. end
. gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
. list city
+---------+
| city |
|---------|
1. | Alber |
2. | Alber |
3. | Annisto |
4. | Annisto |
5. | Annisto |
|---------|
6. | Decat |
7. | Decat |
+---------+
.
end of do-file
Related Posts with Help with extracting strings using regular expressions
Export labels from alpha output to ExcelI am estimating the alpha of several variables. I have specified asis, item, and label. I need to be…
Splitting a dataset into multiple datasetsHey all, Newish to STATA - longtime SAS user - sorry for the basic question. I have a dataset of ca…
Help with bootstrap in obtaining a standard error.Hi Everyone: I think my problem has nothing to do with the data set and so I'm not showing a data ex…
Listing out the frequency distributions for multiple variablesHello! Quick question - I have data from a survey where participants can select multiple options fo…
residual from multiple regresionI'm working with a base that it´s compouse from 500 samples, but I have to create a compillation of …
Subscribe to:
Post Comments (Atom)
0 Response to Help with extracting strings using regular expressions
Post a Comment