I have imported "household survey" in csv format that contains numerous data entry errors. The most common one is a double entry. When importing, stata reads all my variables as string even for the expenditure and income variables. I am trying to clean it and want to use a loop for stata to replace these data entry issues so I may destring my expenditure and income variables and other numerical variables
The data set is
obs: 84,317
vars: 246
One of the ways i tried to do it is to identify these double entries as such for one variable :
Code:
gen rent1=regexm(exp_jdtheportionre, "[0-9] [0-9]")
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str32(exp_jdtheportionre exp_jdutilitiesele) str33 exp_jdfoodexcludin "0 0" "0 0" "0 0" "0 0" "0 0" "0 0" "75 75" "10 10" "50 50" "150 150" "23 23" "100" "50 50" "10 10" "50 50" "0 0" "0 0" "0 0" "150 150" "20 20" "50 50" "75 75" "20 20" "100 100" "110 110" "30 30" "150 150" "50 50" "0 0" "0 0" "0 0" "0 0" "0 0" "125 125" "10 10" "50 50" "100 100" "25 25" "35 35" "100 100" "24 24" "40 40" "150 150" "24 24" "70 70" "0 0" "27 27" "70 70" "50 50" "7 7" "50 50" "40 40" "15 15" "40 40" "100 100" "30 30" "100 100" "150 150" "17 17" "30 30" end
Then I would run for my varlist the following to replace for instance the "0 0" entry by "0" etc..
Code:
foreach var of varlist `r(varlist)' { 2. replace `var'="0" if inlist(`var',"0 0") 3. }
Code:
ds, has(type string) foreach var of varlist `r(varlist)' { replace `var'="Yes" if inlist(`var',"Yes Yes") } foreach var of varlist `r(varlist)' { replace `var'="No" if inlist(`var',"No No") }
Thanks for any help
Best
H
0 Response to Trying to code for a loop to clean data entry errors and destring my variables.
Post a Comment