First time poster, so I’m sorry for any errors…
I have two ID variables (ID1 and ID2). I want to create a new ID variable taking into account duplicates in both. Where there are duplicates in EITHER ID1 OR ID2, I want to treat this as the same person. Essentially, I want to generate a new variable which looks like NewID below. I have tried using something like:
by ID1 ID2, sort: gen NewID=1 if _n==1
replace NewID = sum(NewID)
but this only takes into account where there are duplicates across BOTH ID1 and ID2. I guess something like the below would be ideal, but Stata doesn't let me put in the | symbol into this
by ID1 | ID2, sort: gen NewId=1 if _n==1
replace NewID = sum(NewID)
I should also add that ID1 and ID2 are not ordered consistently, so I can’t just use _n-1
ID1 ID2 NewID
1 a 1
1 b 1
2 b 1
3 c 2
1 g 1
4 c 2
5 d 3
5 e 3
6 f 4
Any help would be very much appreciated!! Thank you!
Related Posts with Generating new ID variable taking into account duplicates across 2 other variables
Median per group for every year (panel data)Hello! For my thesis I want to test a specific hypothesis that requires me to generate a variable U…
How to change the 'levelsof' a panel variable?Hello All, I am trying to create a synthetic control group using Stata's 'synth' package but when I…
How can I count the number of elements in a numlist?Is there a command or function that counts the number of elements in a numlist? …
Egen invalid syntax errorI am at my wits end with this. Can someone please see the error in my code. It was working before. T…
How to replace missing observations with the weighted sum of other observationsHello All, I am currently working with a panel data-set where I have missing observations for every…
Subscribe to:
Post Comments (Atom)
0 Response to Generating new ID variable taking into account duplicates across 2 other variables
Post a Comment