I have two different large-ish surveys of the US adult population. The two in principle are measuring similar labor market and demographic concepts, but survey #2 has a variable of interest, treat, that survey #1 lacks.

Naturally, I have the two surveys in separate .dta files.

What I'd like do is statistically match individuals from survey #2 to individuals in survey #1 conditional on a variety of demographic variables. For the purposes of this question, let's assume I just have four: race, age, education, and sex.

For every individual in survey #1 of a given combination of those four variables, I'd like to randomly link to an individual in survey #2 who matches on the same four variables.

One complication is that although the weighted population in both surveys is (in principle) the same, the two surveys have a different number of raw cells. So ideally the solution would incorporate the weights (let's call that variable weight1 in survey #1 and weight2 in survey #2).

An inefficient method I've tried for doing this is calling
Code:
expand weight2
in survey #2 and identifying row ranges for different groups that I can then randomly generate in survey #1. But as the populations involved are 200+ million people, that gets unwieldy fast.