Dear All,
I am working on two survey datasets and have encountered the same problem in the small dataset. I am trying to merge 4 data files in the smaller survey and having a problem with duplicate variables. The data sets are on wheat, corn, barley and demography data. The demography and wheat file that am trying to merge have member id variables while the corn and barley only have the cluster and hh. This is a step by step explanation of what I did
use demography.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save demo.dta, replace
use wheat.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save wheatO.dta, replace
use barley.dta, clear
gen qid = string (cluster) + string (hh)
save barleyO.dta, replace
use corn.dta, clear
gen Eid = string (cluster) + string (hh)
save cornO.dta, replace
when I tab crop variable I get
wheat = 320
barley= 663
corn= 422
Then I proceed to merge as follows:
use demo.dta, clear // memID
merge m:m qid2 using wheatO.dta
rename _merge MERGE
sort cluster hh memID
drop if merge !=3
tab crop and I get wheat = 320 (which is the same as before the merge=great)
save whdemo.dta
merge m:m qid using barleyO.dta
sort cluster hh memID
order MERGE, after (_merge)
drop if _merge !=3
tab crop
I get 320 wheat (great) but for barley I get 780 (which is way beyond the 663)
what am I doing wrong
Many thanks for your help in advance
Related Posts with merging survey data files - duplicate data problem
Export multiple xtabond2 resultsHello Statalist users, I need help with exporting multiple xtbond2 estimations along with their pos…
How to create groups of observations based on shared characteristics and create variables around that.So I'm doing some work with election data. The data I have is formatted like this: Code: * Example…
What is intreg? What is the theory behind it?The UCLA IDRE website has an article on -intreg- to estimate models with interval dependent variable…
Dummies on events counting fo panelHello , I need to create two different dummies variables "new and "old from a score range indicator…
Testing the CAPM using the Fama-MacBeth (1973) ApproachDear all, This is a purely statistical question, and has nothing to do with the programming languag…
Subscribe to:
Post Comments (Atom)
0 Response to merging survey data files - duplicate data problem
Post a Comment