Dear All,
I am working on two survey datasets and have encountered the same problem in the small dataset. I am trying to merge 4 data files in the smaller survey and having a problem with duplicate variables. The data sets are on wheat, corn, barley and demography data. The demography and wheat file that am trying to merge have member id variables while the corn and barley only have the cluster and hh. This is a step by step explanation of what I did
use demography.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save demo.dta, replace
use wheat.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save wheatO.dta, replace
use barley.dta, clear
gen qid = string (cluster) + string (hh)
save barleyO.dta, replace
use corn.dta, clear
gen Eid = string (cluster) + string (hh)
save cornO.dta, replace
when I tab crop variable I get
wheat = 320
barley= 663
corn= 422
Then I proceed to merge as follows:
use demo.dta, clear // memID
merge m:m qid2 using wheatO.dta
rename _merge MERGE
sort cluster hh memID
drop if merge !=3
tab crop and I get wheat = 320 (which is the same as before the merge=great)
save whdemo.dta
merge m:m qid using barleyO.dta
sort cluster hh memID
order MERGE, after (_merge)
drop if _merge !=3
tab crop
I get 320 wheat (great) but for barley I get 780 (which is way beyond the 663)
what am I doing wrong
Many thanks for your help in advance
Related Posts with merging survey data files - duplicate data problem
Different Means in Policy StancesFor a paper project I am looking at whether Evangelicals have different means in their policy stance…
problem with saving plots names in loopHi everyone, The code below generates plots by the levels of profile variable but because the levels…
Coefficient, SE and Residuals are same but not the predicted yhatHi, I have noticed that when I run the following, they produce exactly similar results for coeffici…
Making esttab use factor labelsDear All, I am running a regression with country variable, if I run a normal regression in Stata th…
Reshape LongI want to format my data from this pattern: Company Name Jul-03 Aug-03 Sep-03 Oct-03 Nov-03 Dec-0…
Subscribe to:
Post Comments (Atom)
0 Response to merging survey data files - duplicate data problem
Post a Comment