Hello Everyone,

I hope that you could help me with the below.

I have a cross-sectional dataset of 343 observations of students' scores in a test. Students are from different schools and grades. However, some students have solved the test multiple times and thus resulting in duplicates.

I have multiple conditions that I would like to tell STATA in order to drop specific duplicates:
1. I would like to drop the duplicate with a missing "Score".
2. If the duplicate does not have any missing scores, I would like to drop the duplicate with the earliest recorded date "StartDate".

A snippet of my data:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(StartDate id) float(schoolname2 gender2 grade2) byte(Score tag) float dup
     1928621099000 505711 0 0 0  . 1 1
     1928621827000 505711 0 0 0  9 1 2
1928624421000.0002 505713 0 0 0 19 1 1
     1928624452000 505713 0 0 0 15 1 2
     1928623906000 505715 0 0 0 20 0 0
     1928621142000 505716 0 0 0 14 0 0
1928621051000.0002 505718 0 0 0 18 0 0
     1928623971000 505724 0 0 0 13 0 0
     1928614160000 505726 0 0 0 15 1 1
1928627513000.0002 505726 0 0 0 16 1 2
end
format %tcnn/dd/ccYY_hh:MM StartDate
To do this I have initially typed in the following syntax:
Code:
duplicates report id schoolname2
duplicates list id schoolname2, sepby (id)
duplicates tag id schoolname2, gen (tag)
duplicates list id schoolname2 if tag >=1, sepby (id)
sort schoolname2 id 
quietly by schoolname2 id: gen dup = cond(_N==1,0,_n) if schoolname!="" | id!=.
sort schoolname2 id StartDate
However, I could not come up with the code to achieve the above conditions.

Thank you. Looking forward.