Dear all,

I matched 2 datasets using the joinby command on the variable "project_id". The result is a table with pairs of workers who collaborate on specific projects.

As you can see, there are some duplicate pairs when worker1_id and worker2_id are exchanged. For example, the worker pairs in line 5 and 6 appear in reversed order in line 7 and 9.

How would I go about identifying and dropping such duplicates (for the same project_id)?

Note that in those cases where a pair appears more than one time per project_id, I would like to keep the observation where worker1 is the lead (variable "worker1_lead"==1). So for example, when deciding whether to keep the pair in line 5 vs. its duplicate in line 9, we would keep the observation in line 5. When neither observation of a duplicate has a lead worker, then it is irrelevant which one we keep.

I would greatly appreciate your help, thanks a lot in advance!

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(project_id worker1_id worker2_id) byte worker1_lead
  533 2400  837 1
  747   45 2020 0
 1107  418   72 1
 1107   72  418 0
31587  708  462 1
31587  708 2782 1
31587 2782  708 0
31587 2782  462 0
31587  462  708 0
end