Hello,

I am conducting a study to assess the impact of a city-level policy on tobacco use. To answer this question, I have two separate datasets; tobacco use dataset with the outcomes and policy dataset with the exposure variable. The policies are graded on a city level but the problem is that the tobacco use dataset does not have city-level variables but has zipcodes. So I created a policy dataset that has the same zipcodes and linked them to city names with each corresponding policy grade. I planned to merge both datasets using the common zipcode variable to conduct my analysis. Please note the bestzip variable is numeric in both datasets.

I used merge 1:1, 1:m and m:1 in Stata 15.1 but I kept getting the same error messages: variable bestzip does not uniquely identify observations in the master data
r(459);

. merge 1:1 bestzip using "C:\Users\Tobacco_use.dta"
variable bestzip does not uniquely identify observations in the master data
r(459);

. merge 1:m bestzip using "C:\Users\Tobacco_use.dta.dta"
variable bestzip does not uniquely identify observations in the master data
r(459);

. merge m:1 bestzip using "C:\Users\Policydata"
variable bestzip does not uniquely identify observations in the using data
r(459);

So when I checked the data, I realized there are some zipcodes that cut across different cities as seen in the last 2 lines "Los Angeles" and "Commerce" in 90022.

. dataex bestzip city city_2012, count(23)

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long bestzip str22 city byte city_2012
90001 "Los Angeles"          4
90002 "Los Angeles"          4
90003 "Los Angeles"          4
90004 "Los Angeles"          4
90005 "Los Angeles"          4
90006 "Los Angeles"          4
90007 "Los Angeles"          4
90008 "Los Angeles"          4
90009 "Los Angeles"          4
90010 "Los Angeles"          4
90011 "Los Angeles"          4
90012 "Los Angeles"          4
90013 "Los Angeles"          4
90014 "Los Angeles"          4
90015 "Los Angeles"          4
90016 "Los Angeles"          4
90017 "Los Angeles"          4
90018 "Los Angeles"          4
90019 "Los Angeles"          4
90020 "Los Angeles"          4
90021 "Los Angeles"          4
90022 "Los Angeles"          4
90022 "Commerce            " 0
end
My question is do I have other options to merge these two datasets using the zipcode variable or is there a way to create a unique id that I can use to merge both of them?