I am working on cleaning up a dataset, and I do not know how to do so. Here are some of the relevant parts of the data set that has approximately 400,000 observations, with duplicates based on an ID number. What I want to do is keep the case IDs that correspond to the highest outcome. So, for the following:
| ID Number | Date | Outcome |
| 3 | 2/2/22 | 4 |
| 3 | 2/2/22 | 3 |
| 3 | 2/2/22 | 3 |
| 3 | 2/2/22 | 2 |
I have tried to google this, but got very confused by duplicates and dups. I'd really appreciate any suggestions anyone has.
No comments:
Post a Comment