I am working on cleaning up a dataset, and I do not know how to do so. Here are some of the relevant parts of the data set that has approximately 400,000 observations, with duplicates based on an ID number. What I want to do is keep the case IDs that correspond to the highest outcome. So, for the following:
ID Number | Date | Outcome |
3 | 2/2/22 | 4 |
3 | 2/2/22 | 3 |
3 | 2/2/22 | 3 |
3 | 2/2/22 | 2 |
I have tried to google this, but got very confused by duplicates and dups. I'd really appreciate any suggestions anyone has.
0 Response to How to Keep Duplicated Variables based on the Value of another Column
Post a Comment