Hello,

I have a large dataset, with 2.800.000 observations and 10 variables aproximately, and I would like to check if the values of one string variable are contained in the values of other string variable of another dataset.

The idea is to check if the first names of a list of participants to a class are actually names and not last names, comparing this with a list of actual names (120.000 observations).

So, this is an abbreviated version of the list of participants to the class:

Dataset: list of participants.
obs first_name last_name
1 john cohen
2 arthur williams
3 fox rachel
4 robert foster







This is an abbreviated version of the list of names:

Dataset: list of names.
obs first_name
1 lane
2 david
3 arthur
4 robert
5 rachel
6 john
7 lucy











And this is an example of the result I would like to obtain:
obs first_name last_name first_name_control
1 john cohen 1
2 arthur williams 1
3 fox rachel 0
4 robert foster 1








I do not have any "wrong results" since I do not know how to proceed, but I would really appreciate your help.

In case this information is important, I am currently using Stata 14.


I hope I have fulfilled all the Statalist forum discussion recommendations, thanks in advance,
Isidora.