Hi, I'm currently needing help to replace duplicate entry on a variable as missing (with a dot) instead of dropping them altogether. I am currently using Stata 14. The data set looks as follows:
Code:
ID variable1 variable2
001 . .
001 . .
001 . .
002 500 .
002 500 .
002 500 .
003 453 148
003 453 .
003 453 .
004 529 .
004 529 156
004 529 .
005 514 .
005 514 .
005 514 163
006 453 .
006 453 148
006 453 .
In this case, there are observations where the variable1 are duplicates within the same ID number, while none of the variable2 is duplicate (all of those values are generated through mean command, so there would be the same value on different ID). Is it possible to fill other than the first entry on variable 1 as missing, instead of dropping them? Dropping duplicate observation on variable1 other than the first observation would risk dropping non-missing variable2. I am looking for ways so that my data can look like this
Code:
ID variable1 variable2
001 . .
001 . .
001 . .
002 500 .
002 . .
002 . .
003 453 148
003 . .
003 . .
004 529 .
004 . 156
004 . .
005 514 .
005 . .
005 . 163
006 453 .
006 . 148
006 . .
As a note, it would be better if there can be a conditional on variable 2 so that instead of filling-as-missing other than the first observation on each ID, it would keep variable1 values where there is a corresponding non-missing observation on variable 2 while coding-as-missing other observation, but this is not a strict requirement.
Code:
ID variable1 variable2
001 . .
001 . .
001 . .
002 500 .
002 . .
002 . .
003 453 148
003 . .
003 . .
004 . .
004 529 156
004 . .
005 . .
005 . .
005 514 163
006 . .
006 453 148
006 . .
0 Response to Replacing duplicate values with missing values on a variable with regards to other variable
Post a Comment