I matched 2 datasets using the joinby command on the variable "project_id". The result is a table with pairs of workers who collaborate on specific projects.
As you can see, there are some duplicate pairs when worker1_id and worker2_id are exchanged. For example, the worker pairs in line 5 and 6 appear in reversed order in line 7 and 9.
How would I go about identifying and dropping such duplicates (for the same project_id)?
Note that in those cases where a pair appears more than one time per project_id, I would like to keep the observation where worker1 is the lead (variable "worker1_lead"==1). So for example, when deciding whether to keep the pair in line 5 vs. its duplicate in line 9, we would keep the observation in line 5. When neither observation of a duplicate has a lead worker, then it is irrelevant which one we keep.
I would greatly appreciate your help, thanks a lot in advance!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(project_id worker1_id worker2_id) byte worker1_lead 533 2400 837 1 747 45 2020 0 1107 418 72 1 1107 72 418 0 31587 708 462 1 31587 708 2782 1 31587 2782 708 0 31587 2782 462 0 31587 462 708 0 end
0 Response to Identifying duplicates within groups based on two variables in reverse order
Post a Comment