Hello everyone,
I have a large dataset with many duplicates across different variables. I am trying to find what are the differences within the duplicates. I came across the obsdiff ( by Eric Booth) command and tried using it but I am not sure how to specify within duplicates not rows.
I have created a sample table similar to what I am trying to do. I need to find the differences in DOB, nationality , gender and result within duplicates of ID .
For examples : what are the differences in DOB, nationality , gender and result within duplicates of ID 1 ?
ID DOB Nationality age gender result
1 1996 Jordan 25 F P
1 1996 Jordan 25 F P
1 1996 Egypt 25 F P
1 1997 Egypt 25 F N
1 1997 Jordan 25 F N
1 1996 Qatar 24 F P
2 1995 Lebanon 12 M N
2 1995 Lebanon 12 M N
2 1995 Lebanon 14 M P
2 1995 Lebanon 11 M P
2 1995 Lebanon 12 M P
3 1998 Syria 21 F N
4 1996 Syria 22 F P
5 2000 Qatar 23 F N
The code I have been using is :
obsdiff DOB Nationality age gender result , row (1/15).

I want to do the same command but within duplicates of each ID without listing the rows for each group of ID duplicates ( my original dataset has millions of duplicates) .
Is this possible within this command ?

Thank you !
Heba