Matching one variable with another variable from different dataset

Hello,

I have a large dataset, with 2.800.000 observations and 10 variables aproximately, and I would like to check if the values of one string variable are contained in the values of other string variable of another dataset.

The idea is to check if the first names of a list of participants to a class are actually names and not last names, comparing this with a list of actual names (120.000 observations).

So, this is an abbreviated version of the list of participants to the class:

Dataset: list of participants.

obs	first_name	last_name
1	john	cohen
2	arthur	williams
3	fox	rachel
4	robert	foster

This is an abbreviated version of the list of names:

Dataset: list of names.

obs	first_name
1	lane
2	david
3	arthur
4	robert
5	rachel
6	john
7	lucy

And this is an example of the result I would like to obtain:

obs	first_name	last_name	first_name_control
1	john	cohen	1
2	arthur	williams	1
3	fox	rachel	0
4	robert	foster	1

I do not have any "wrong results" since I do not know how to proceed, but I would really appreciate your help.

In case this information is important, I am currently using Stata 14.

I hope I have fulfilled all the Statalist forum discussion recommendations, thanks in advance,
Isidora.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Matching one variable with another variable from different dataset
Matching one variable with another variable from different dataset

0 Response to Matching one variable with another variable from different dataset

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Matching one variable with another variable from different dataset Matching one variable with another variable from different dataset

0 Response to Matching one variable with another variable from different dataset

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Matching one variable with another variable from different dataset
Matching one variable with another variable from different dataset