I hope that you could help me with the below.
I have a cross-sectional dataset of 343 observations of students' scores in a test. Students are from different schools and grades. However, some students have solved the test multiple times and thus resulting in duplicates.
I have multiple conditions that I would like to tell STATA in order to drop specific duplicates:
1. I would like to drop the duplicate with a missing "Score".
2. If the duplicate does not have any missing scores, I would like to drop the duplicate with the earliest recorded date "StartDate".
A snippet of my data:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(StartDate id) float(schoolname2 gender2 grade2) byte(Score tag) float dup
1928621099000 505711 0 0 0 . 1 1
1928621827000 505711 0 0 0 9 1 2
1928624421000.0002 505713 0 0 0 19 1 1
1928624452000 505713 0 0 0 15 1 2
1928623906000 505715 0 0 0 20 0 0
1928621142000 505716 0 0 0 14 0 0
1928621051000.0002 505718 0 0 0 18 0 0
1928623971000 505724 0 0 0 13 0 0
1928614160000 505726 0 0 0 15 1 1
1928627513000.0002 505726 0 0 0 16 1 2
end
format %tcnn/dd/ccYY_hh:MM StartDate
Code:
duplicates report id schoolname2 duplicates list id schoolname2, sepby (id) duplicates tag id schoolname2, gen (tag) duplicates list id schoolname2 if tag >=1, sepby (id) sort schoolname2 id quietly by schoolname2 id: gen dup = cond(_N==1,0,_n) if schoolname!="" | id!=. sort schoolname2 id StartDate
Thank you. Looking forward.
0 Response to How to Drop Duplicate ID Observations if There are Multiple Conditions I Want to Apply
Post a Comment