I'm trying to clean an error that occurred when a text string variable containing commas (var1 = "Fred, Sarah, & Abdul Inc") was spread to 1-3 other variables in a parsing error (var1="Fred"; var2= "Sarah"; var3= "& Abdul Inc"), pushing that row out to three extra columns.
My actual data set contains 1.5 million records and 28 variables. Here is a small example of the problem:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float visit str20(specialty location address condition v4) 20 "Internal Medicine" "Main Clinic" "245 Oak St" "Diabetes" "" 21 "Pulmonology" "Hospital B" "8211 Peabody St" "COPD" "" 22 "Dermatology" "Cosmetic" "Clinic 2" "3588 King St" "Dermabrasion" 23 "Family Medicine" "Clinic 2" "3588 King St" "Sinus Congestion" "" 24 "Dermatology" "Cosmetic" "Clinic 2" "3588 King St" "Scar Removal" end
APPROACH 1
Code:
local offset_vars address condition v4 tokenize `offset_vars' forval i=1/2 { replace word `i' = "`i' + 1" if location == "Cosmetic" } replace specialty = "Dermatology_Cosmetic" if specialty == "Dermatology"
APPROACH 2
Code:
local offset_vars address condition v4 set trace on forval i=1/2 { local var1 `: word `i' of `offset_vars'' di var1 local var2 `: word ``i' + 1' of `offset_vars'' di var2 replace `var1' = `var2' if location == "Cosmetic" }
Any suggestions? Kind regards and thank you in advance for your help.
0 Response to Shift values across columns (variables) to clean semi-regular parsing error
Post a Comment