I am combining multiple administrative data sets and would be happy to receive input on a sensible approach. I cannot give a data example as I am using sensitive data, but I made up two data sets to illustrate the issue.

Say we have two individual-level data sets with data registered monthly:
  1. Prescription drugs with values A, B, C. n = 10000.
  2. Emergency room contacts with contact reasons. n = 2000.
Here is an illustration of the data structures I want to combine:

Array

There are
  1. Multiple entries for ID with possibly multiple entries in the same month.
  2. Some ID's may be registered in the prescription data base but not the emergency room data base, and vice versa.
My main issue is this: I don't see an obvious application of merge 1:1, m:1 or 1:m with either ID or ID YM.

Merge 1:1, 1:m, and m:1 all return
Code:
variable ID does not uniquely identify observations in the using data
r(459)
Merging using ID YM will result in the same error and also make no sense as we would merge dates on prescriptions with dates for emergency room contacts.

From Stata documentation and posts -merge m:m- does not seem advisable. I am currently looking at -joinby- as an alternative to -merge-, but I am not certain whether this is correct either.