Dear All,

I have imported "household survey" in csv format that contains numerous data entry errors. The most common one is a double entry. When importing, stata reads all my variables as string even for the expenditure and income variables. I am trying to clean it and want to use a loop for stata to replace these data entry issues so I may destring my expenditure and income variables and other numerical variables

The data set is
obs: 84,317
vars: 246


One of the ways i tried to do it is to identify these double entries as such for one variable :

Code:
gen rent1=regexm(exp_jdtheportionre, "[0-9] [0-9]")
This is what my data looks like for certain " double entry " observations for only 3 of the variables but it is consistent through each income or expenditure related variable or numerical variable such as household size or age, etc.. .


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str32(exp_jdtheportionre exp_jdutilitiesele) str33 exp_jdfoodexcludin
"0 0"     "0 0"   "0 0"    
"0 0"     "0 0"   "0 0"    
"75 75"   "10 10" "50 50"  
"150 150" "23 23" "100"    
"50 50"   "10 10" "50 50"  
"0 0"     "0 0"   "0 0"    
"150 150" "20 20" "50 50"  
"75 75"   "20 20" "100 100"
"110 110" "30 30" "150 150"
"50 50"   "0 0"   "0 0"    
"0 0"     "0 0"   "0 0"    
"125 125" "10 10" "50 50"  
"100 100" "25 25" "35 35"  
"100 100" "24 24" "40 40"  
"150 150" "24 24" "70 70"  
"0 0"     "27 27" "70 70"  
"50 50"   "7 7"   "50 50"  
"40 40"   "15 15" "40 40"  
"100 100" "30 30" "100 100"
"150 150" "17 17" "30 30"  
end
---------

Then I would run for my varlist the following to replace for instance the "0 0" entry by "0" etc..

Code:
foreach var of varlist `r(varlist)' {
  2. replace `var'="0" if inlist(`var',"0 0")
  3. }
However, it is time consuming and totally inelegant and does not identify all these double entry errors in my dataset. It was fine when it was for a "Yes Yes" or "No No" entry.

Code:
ds, has(type string)
foreach var of varlist `r(varlist)' {
replace `var'="Yes" if inlist(`var',"Yes Yes")
}
foreach var of varlist `r(varlist)' {
replace `var'="No" if inlist(`var',"No No")
}
I did try to look at the string functions to identify and replace but I am not finding the right way to do it. I am sure there is a way for string functions to do it but I am not able to code it or find it.

Thanks for any help

Best

H