BJ Data Tech Solution: How to Keep Duplicated Variables based on the Value of another Column

Saturday, May 7, 2022

How to Keep Duplicated Variables based on the Value of another Column

Hello all!

I am working on cleaning up a dataset, and I do not know how to do so. Here are some of the relevant parts of the data set that has approximately 400,000 observations, with duplicates based on an ID number. What I want to do is keep the case IDs that correspond to the highest outcome. So, for the following:

ID Number	Date	Outcome
3	2/2/22	4
3	2/2/22	3
3	2/2/22	3
3	2/2/22	2

I want to keep only the first row because it has the highest code. Some IDs have 5 corresponding values for outcomes; some have 2; I think one even has 10.

I have tried to google this, but got very confused by duplicates and dups. I'd really appreciate any suggestions anyone has.

BJ Data Tech Solution

Saturday, May 7, 2022

How to Keep Duplicated Variables based on the Value of another Column

No comments:

Post a Comment