I am conducting a study to assess the impact of a city-level policy on tobacco use. To answer this question, I have two separate datasets; tobacco use dataset with the outcomes and policy dataset with the exposure variable. The policies are graded on a city level but the problem is that the tobacco use dataset does not have city-level variables but has zipcodes. So I created a policy dataset that has the same zipcodes and linked them to city names with each corresponding policy grade. I planned to merge both datasets using the common zipcode variable to conduct my analysis. Please note the bestzip variable is numeric in both datasets.
I used merge 1:1, 1:m and m:1 in Stata 15.1 but I kept getting the same error messages: variable bestzip does not uniquely identify observations in the master data
r(459);
. merge 1:1 bestzip using "C:\Users\Tobacco_use.dta"
variable bestzip does not uniquely identify observations in the master data
r(459);
. merge 1:m bestzip using "C:\Users\Tobacco_use.dta.dta"
variable bestzip does not uniquely identify observations in the master data
r(459);
. merge m:1 bestzip using "C:\Users\Policydata"
variable bestzip does not uniquely identify observations in the using data
r(459);
So when I checked the data, I realized there are some zipcodes that cut across different cities as seen in the last 2 lines "Los Angeles" and "Commerce" in 90022.
. dataex bestzip city city_2012, count(23)
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long bestzip str22 city byte city_2012 90001 "Los Angeles" 4 90002 "Los Angeles" 4 90003 "Los Angeles" 4 90004 "Los Angeles" 4 90005 "Los Angeles" 4 90006 "Los Angeles" 4 90007 "Los Angeles" 4 90008 "Los Angeles" 4 90009 "Los Angeles" 4 90010 "Los Angeles" 4 90011 "Los Angeles" 4 90012 "Los Angeles" 4 90013 "Los Angeles" 4 90014 "Los Angeles" 4 90015 "Los Angeles" 4 90016 "Los Angeles" 4 90017 "Los Angeles" 4 90018 "Los Angeles" 4 90019 "Los Angeles" 4 90020 "Los Angeles" 4 90021 "Los Angeles" 4 90022 "Los Angeles" 4 90022 "Commerce " 0 end
0 Response to Merging two datasets
Post a Comment