Hi Statalist.
I am trying to use regexs and regexm to extra information from strings. A sample is below. The strings contain location on town names, which always comes first, and firm names, which always comes second. The town and firm names are usually separated by a comma, but could be separated by any punctuation character. Infrequently punctuation appears before the town name. Only part of the town name is extracted by the command that I wrote:
gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
Please let me know if you have an idea about how I could extract the entire town name.
Sample data, code, and output appear below.
Thanks
Gary
input str60 LocationFirm
"Albertville,First*................"
"Albertville,Albertville-"
"Anniston.Anniston—--"
"Anniston;Commercial-."
"Anniston^Blender-."
"Decatur,MorganCounty"
"..Decatur,Jupiter"
end
gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
list city
. clear
. do "C:\Users\garyr\AppData\Local\Temp\STD462c_000000. tmp"
. input str60 LocationFirm
LocationFirm
1. "Albertville,First*................"
2. "Albertville,Albertville-"
3. "Anniston.Anniston—--"
4. "Anniston;Commercial-."
5. "Anniston^Blender-."
6. "Decatur,MorganCounty"
7. "..Decatur,Jupiter"
8. end
. gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
. list city
+---------+
| city |
|---------|
1. | Alber |
2. | Alber |
3. | Annisto |
4. | Annisto |
5. | Annisto |
|---------|
6. | Decat |
7. | Decat |
+---------+
.
end of do-file
Related Posts with Help with extracting strings using regular expressions
Labels at scatter plotDear All, I am always fascinated seeing how much useful information can be derived from even the si…
Reshaping the Bruegel DatasetGood Day! I am currently trying to reshape the Bruegel Dataset. Currently it is comprised of a year…
Double - robust regression estimator in StataHi all - I am applying the command from Emsley, Lunt, Pickle and Dunn (The State Journal 2018 https…
Convert calendar date to fiscal quartersDear all, I've a question on how to convert calendar dates to different (i.e. fiscal) quarters in S…
replacing "X" character within a string variable with "9"Hi, I have a string variable that some of its values consist of a combination of numbers and "X" (e.…
Subscribe to:
Post Comments (Atom)
0 Response to Help with extracting strings using regular expressions
Post a Comment