Dear All, I have this dataset.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str53 道路名稱 str12 路一 str6(段一 巷一 弄一 號) str12 路二 str6(段二 巷二 弄二 號二)
"康樂街72巷6弄1號前"                             "康樂街"    ""     "72巷"  "6弄"  "1號"   ""             ""     ""       "" ""
"竹子湖路47號燈桿"                               "竹子湖路" ""     ""       ""      "47號"  ""             ""     ""       "" ""
"大同區環河北路2段與環河北路2段167巷口" "環河北路" "2段" ""       ""      ""       "環河北路" "2段" "167巷" "" ""
"信義路5段與松智路口"                          "信義路"    "5段" ""       ""      ""       "松智路"    ""     ""       "" ""
"市民大道2段與林森北路口"                    "市民大道" "2段" ""       ""      ""       "林森北路" ""     ""       "" ""
"中山北路7段141巷31弄"                           "中山北路" "7段" "141巷" "31弄" ""       ""             ""     ""       "" ""
"環河北路 酒泉街"                                "環河北路" ""     ""       ""      ""       "酒泉街"    ""     ""       "" ""
"忠孝東路4段 逸仙路42巷口"                    "忠孝東路" "4段" ""       ""      ""       "逸仙路"    ""     "42巷"  "" ""
end
This first column (variable, 道路名稱) is the raw data, denoting the addresses of a car accident. And the desired results are shown from column 2 to column 6 (one address) and column 7 to column 11 (another address).
Note that:
1. The accident might occur at one address (say, observations 1, 2, and 6), but most occur at the intersection of two addresses.
2. I wish to split the addresses into (段一 巷一 弄一 號一), and (段二 巷二 弄二 號二) if there are two addresses.
In Chinese, "段" is section, "巷" is lane, "弄" (I don't know this in English), "號" is the number.
3. Some nuisance characters are present, say for observation 1, the last word "前" (before), for observation 2, the last two words "燈桿" (Light pole).
Also, for observation 5, in the middle of "市民大道2段與林森北路口", "與" means "and". However, I don't want those appearing in the final output.

Any suggestions are highly appreciated.