Hi everyone,

I am trying to use the nearest neighbour matching using the Mahalanobian matching. I have attached a sample dta file (test_nearest).
The "test_nearest.dta" dataset has the output columns generated by the following code:

Code:
psmatch2 treatment, mahal(industry occu salary)
dataex "/Users/fek-vkd/Documents/phd/self-papers/status_project/test_nearest.dta"
The output columns suggest that observation with _id = 14 is "nearest matched" with _id=2, observation with _id = 15 is "nearest matched" with _id=4. And so on for other treatment observations.

To continue further, i need a code that produces the output as in column "nearest_match". Essentially, the code that I need should identify all the control observations that are matched with a treatment observation. For now, I have manually created column "nearest_match". My "real" dataset has about 13 million observations. Has anyone worked on such a situation before and has a code?
One possible way I can do it is by iterating through all the observations (_N) but 13 million observations is a lot and such a code will take a long long time. Can anyone help with an efficient piece of code?


Really appreciate any help on this.

Thanks in advance.

/J