I have two datasets. One dataset has families of 2 members and the other dataset has families of uneven sizes. In all the names, there are often minor spelling mistakes so often, we will not get exact matches.

How can I match the 2-member family in the first dataset with the n-member family in the second dataset? The names in the 2-member family must exist in the n-member family although it will not exist exactly due to spelling mistakes.

In some sense, I am trying to find a way to tell STATA to :

“Match if 4 words in the 2-member and n-member families are 80% similar”. I say 4 words accounting for 2 first names and 2 last names in the first dataset for each 2-member family and 80% to account for spelling mistakes.

Is there a way to do this?