Dear All,

I am working on two survey datasets and have encountered the same problem in the small dataset. I am trying to merge 4 data files in the smaller survey and having a problem with duplicate variables. The data sets are on wheat, corn, barley and demography data. The demography and wheat file that am trying to merge have member id variables while the corn and barley only have the cluster and hh. This is a step by step explanation of what I did

use demography.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save demo.dta, replace

use wheat.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save wheatO.dta, replace

use barley.dta, clear
gen qid = string (cluster) + string (hh)
save barleyO.dta, replace

use corn.dta, clear
gen Eid = string (cluster) + string (hh)
save cornO.dta, replace

when I tab crop variable I get
wheat = 320
barley= 663
corn= 422

Then I proceed to merge as follows:

use demo.dta, clear // memID
merge m:m qid2 using wheatO.dta
rename _merge MERGE
sort cluster hh memID
drop if merge !=3

tab crop and I get wheat = 320 (which is the same as before the merge=great)

save whdemo.dta

merge m:m qid using barleyO.dta
sort cluster hh memID
order MERGE, after (_merge)
drop if _merge !=3

tab crop
I get 320 wheat (great) but for barley I get 780 (which is way beyond the 663)

what am I doing wrong

Many thanks for your help in advance