I have a dataset with almost 3 million patientadmissions over a timespan of several years. Every ID has several observations, because every patient may have had several admissions or registrations at various departments, where in some deparment-registrations they have the diagnosis that i am interested in and in some they do not. Some patients do not have any registered diagnosis at all, but i still need them to contribute to the analysis, as not having the outcome. One patient ID can appear many times because of registration a various departments and therefore the same ID, with the same diagnosis can appear twice or more, because he/she was registered at several departments.

I therefore need to keep all ID's that have a diagnosis code (the earliest registered) + i simultaneously need to keep all ID's that doesn't have a diagnosis code (the earliest registered)

Example:
ID: 0000012354 Diagnosis Code: D862 Admissiondate: 04 mar 13
ID: 0000012354 Diagnosis Code: . Admissiondate: 07 jun 14
ID: 0000012354 Diagnosis Code: . Admissiondate: 08 aug 14
ID: 0000012354 Diagnosis Code: C425 Admissiondate: 21 dec 14
ID: 0000043567 Diagnosis Code: . Admissiondate: 03 jan 13
ID: 0000043567 Diagnosis Code: G700 Admissiondate: 16 dec 14
ID: 0000043567 Diagnosis Code: G700 Admissiondate: 21 dec 14
ID: 0000093231 Diagnosis Code: C243 Admissiondate: 01 may 16
ID: 0000074333 Diagnosis Code: . Admissiondate: 01 may 16
ID: 0000074333 Diagnosis Code: . Admissiondate: 01 may 17
ID: 0000074333 Diagnosis Code: . Admissiondate: 01 may 18
ID: 0000074333 Diagnosis Code: . Admissiondate: 07 may 18

Hope you can help again, thank you