Dear Statalist,

I have a dataset with firms and owners. This is a minimum working example:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 Firm_id str13 Owner_legal_name str7 Owner_last_name str19 Owner_address byte Share
"A" "John Smith "   "Smith"   "87 Granville street" 10
"A" "Maria Lopez"   "Lopez"   "87 Granville street" 20
"A" "Robert Brown"  "Brown"   "1022 Nelson street"   5
"A" "Ron Gilford"   "Gilford" "287 Howe street"     30
"A" "Rebecca Smith" "Smith"   "1022 Nelson street"  10
"A" "Joe Ramsey"    "Ramsey"  "503 Main street"     25
"B" "Anna Mancini"  "Mancini" "49 Rupert avenue"    25
"B" "David Bauer"   "Bauer"   "8 Cambie street"     25
"B" "Tessa Garcia"  "Garcia"  "8 Cambie street"     50
end
I want to reconstruct family relationship among owners as follows: individuals belong to the same sub-family if they have the same last name OR if they live at the same address. Different sub-families can be related among them (forming a family) because they have members in common. For example, John Smith, Maria Lopez, Robert Brown, and Rebecca Smith all belong to one family. My desired output is as follows:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 Firm_id str13 Owner_legal_name str7 Owner_last_name str19 Owner_address byte(Share last_name_id address_id Family_id Familyshare)
"A" "John Smith "   "Smith"   "87 Granville street" 10 1 1 1 45
"A" "Maria Lopez"   "Lopez"   "87 Granville street" 20 2 1 1 45
"A" "Robert Brown"  "Brown"   "1022 Nelson street"   5 3 2 1 45
"A" "Ron Gilford"   "Gilford" "287 Howe street"     30 4 3 2 30
"A" "Rebecca Smith" "Smith"   "1022 Nelson street"  10 1 2 1 45
"A" "Joe Ramsey"    "Ramsey"  "503 Main street"     25 5 4 3 25
"B" "Anna Mancini"  "Mancini" "49 Rupert avenue"    25 1 1 4 25
"B" "David Bauer"   "Bauer"   "8 Cambie street"     25 2 2 5 75
"B" "Tessa Garcia"  "Garcia"  "8 Cambie street"     50 3 2 5 75
end
I thought generating unique identifiers would be a good place to start:
Code:
egen last_name_id=group(Firm_id Owner_last_name)
egen address_id=group(Firm_id Owner_address)
Once I have Family_id, Familyshare (my ultimate variable of interest) is just
Code:
by Firm_id Familyid, sort: egen Familyshare=total(Share)
.
My problem is how to generate Family_id. In other words, how to identify observations which are common between two groups. I looked into egen and tried to browse statalist but surprisingly I couldn't find anything useful.
I would greatly appreciate any suggestion.