Hello there,

I am trying to match data for a case control study on a positive cancer diagnosis on both location and age with a matching negative diagnosis with the same location and age. I also need to merge them into one dataset to show that each of the 50 positive cases was randomly selected a comparison from the pool of comparisons with not more than one match for the 50 positive. I have put my code below. The issue I am having is that when I join them together I end up with a very large number of matches and not the 50 as expected. I also dont know how to merge into one dataset to produce a clear table output. I have used some of @ClydeSchechter code previously but am very confused.

Any help is much appreciated!

Code:

preserve

keep if abb == 1
rename * *_case
rename id_case caseID
rename abb_case cancer
rename abbage_case age
rename abblocation_case location
tempfile cases
save cases

restore

keep if abc==2
rename * *_comparison
rename id_comparison ID
rename abc_comparison cancer
rename abcage_comparison age
rename abclocation_comparison location
tempfile comparison
save comparison

use comparison
set seed 12345
gen rand = runiform()
sort rand
drop rand
save comparison, replace

use cases
joinby age location using comparison