First time poster, so I’m sorry for any errors…
I have two ID variables (ID1 and ID2). I want to create a new ID variable taking into account duplicates in both. Where there are duplicates in EITHER ID1 OR ID2, I want to treat this as the same person. Essentially, I want to generate a new variable which looks like NewID below. I have tried using something like:
by ID1 ID2, sort: gen NewID=1 if _n==1
replace NewID = sum(NewID)
but this only takes into account where there are duplicates across BOTH ID1 and ID2. I guess something like the below would be ideal, but Stata doesn't let me put in the | symbol into this
by ID1 | ID2, sort: gen NewId=1 if _n==1
replace NewID = sum(NewID)
I should also add that ID1 and ID2 are not ordered consistently, so I can’t just use _n-1
ID1 ID2 NewID
1 a 1
1 b 1
2 b 1
3 c 2
1 g 1
4 c 2
5 d 3
5 e 3
6 f 4
Any help would be very much appreciated!! Thank you!
0 Response to Generating new ID variable taking into account duplicates across 2 other variables
Post a Comment