Hi Statalist.
I am trying to use regexs and regexm to extra information from strings. A sample is below. The strings contain location on town names, which always comes first, and firm names, which always comes second. The town and firm names are usually separated by a comma, but could be separated by any punctuation character. Infrequently punctuation appears before the town name. Only part of the town name is extracted by the command that I wrote:
gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
Please let me know if you have an idea about how I could extract the entire town name.
Sample data, code, and output appear below.
Thanks
Gary
input str60 LocationFirm
"Albertville,First*................"
"Albertville,Albertville-"
"Anniston.Anniston—--"
"Anniston;Commercial-."
"Anniston^Blender-."
"Decatur,MorganCounty"
"..Decatur,Jupiter"
end
gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
list city
. clear
. do "C:\Users\garyr\AppData\Local\Temp\STD462c_000000. tmp"
. input str60 LocationFirm
LocationFirm
1. "Albertville,First*................"
2. "Albertville,Albertville-"
3. "Anniston.Anniston—--"
4. "Anniston;Commercial-."
5. "Anniston^Blender-."
6. "Decatur,MorganCounty"
7. "..Decatur,Jupiter"
8. end
. gen city = regexs(1) if regexm(LocationFirm, "([a-zA-Z]+)([:punct:])")
. list city
+---------+
| city |
|---------|
1. | Alber |
2. | Alber |
3. | Annisto |
4. | Annisto |
5. | Annisto |
|---------|
6. | Decat |
7. | Decat |
+---------+
.
end of do-file
Related Posts with Help with extracting strings using regular expressions
Hausman testHi, For my thesis I am doing a panel data analyses, and I am trying to perform an Haussmen test to …
Mann-whiteny U test representationHello I was wondering where to add the asterisk on a bar graph representing proportions of a three-l…
generate event time across yearsDear All, Consider the following data (a duplicate post is here): Code: * Example generated by -dat…
Saving F-statistics in a for loopHello, I am working on a project which requires testing the significance of a subset of of my regre…
Problem in adding p-values of null hypothesis while creating a tableGreetings! As per the model on which I am working where I try to find the impact of mentoring versus…
Subscribe to:
Post Comments (Atom)
0 Response to Help with extracting strings using regular expressions
Post a Comment