I have a large dataset, with 2.800.000 observations and 10 variables aproximately, and I would like to check if the values of one string variable are contained in the values of other string variable of another dataset.
The idea is to check if the first names of a list of participants to a class are actually names and not last names, comparing this with a list of actual names (120.000 observations).
So, this is an abbreviated version of the list of participants to the class:
Dataset: list of participants.
obs | first_name | last_name |
1 | john | cohen |
2 | arthur | williams |
3 | fox | rachel |
4 | robert | foster |
This is an abbreviated version of the list of names:
Dataset: list of names.
obs | first_name |
1 | lane |
2 | david |
3 | arthur |
4 | robert |
5 | rachel |
6 | john |
7 | lucy |
And this is an example of the result I would like to obtain:
obs | first_name | last_name | first_name_control |
1 | john | cohen | 1 |
2 | arthur | williams | 1 |
3 | fox | rachel | 0 |
4 | robert | foster | 1 |
I do not have any "wrong results" since I do not know how to proceed, but I would really appreciate your help.
In case this information is important, I am currently using Stata 14.
I hope I have fulfilled all the Statalist forum discussion recommendations, thanks in advance,
Isidora.
0 Response to Matching one variable with another variable from different dataset
Post a Comment