BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Generating new ID variable taking into account duplicates across 2 other variables
Generating new ID variable taking into account duplicates across 2 other variables

First time poster, so I’m sorry for any errors…

I have two ID variables (ID1 and ID2). I want to create a new ID variable taking into account duplicates in both. Where there are duplicates in EITHER ID1 OR ID2, I want to treat this as the same person. Essentially, I want to generate a new variable which looks like NewID below. I have tried using something like:

by ID1 ID2, sort: gen NewID=1 if _n==1
replace NewID = sum(NewID)

but this only takes into account where there are duplicates across BOTH ID1 and ID2. I guess something like the below would be ideal, but Stata doesn't let me put in the | symbol into this

by ID1 | ID2, sort: gen NewId=1 if _n==1
replace NewID = sum(NewID)

I should also add that ID1 and ID2 are not ordered consistently, so I can’t just use _n-1

ID1 ID2 NewID
1 a 1
1 b 1
2 b 1
3 c 2
1 g 1
4 c 2
5 d 3
5 e 3
6 f 4

Any help would be very much appreciated!! Thank you!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Generating new ID variable taking into account duplicates across 2 other variables
Generating new ID variable taking into account duplicates across 2 other variables

0 Response to Generating new ID variable taking into account duplicates across 2 other variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating new ID variable taking into account duplicates across 2 other variables Generating new ID variable taking into account duplicates across 2 other variables

Related Posts with Generating new ID variable taking into account duplicates across 2 other variables

0 Response to Generating new ID variable taking into account duplicates across 2 other variables