Need help! My goal is to write code to keep a single HbA1c value per year closest to each participant’s birthday.. if there is > 1 HbA1c values done each year. I am working with a time series and longitudinal data set and my table is in long format. My upper column variables are ID_number, HbA1c_test_date, HbA1c_mmolmol, y_status_date (which is the year the HbA1c test was done), diff_days (which is the # of days between the participant’s birthday and 1, 2 or 3 different HbA1c values that were obtained on a given year from 2009 to 2020. I am stuck on my last command. My last command is supposed to only only keep the HbA1c value with minimum # of days (between the birthday and test date) when there are 2 or more HbA1c observations in a given year.

*browse by columns of ID number, year of the HbA1c test and # of days difference between the HbA1c test date and the participant’s birthday.
br ID_number y_status_date new_diff_days

by ID_number y_status_date: generate measure_incl = _n
summarize measure_incl
summarize measure_incl if measure_incl==3
summarize measure_incl if measure_incl==2
summarize measure_incl if measure_incl==1
*Here, I just noted the # of observations when a study participant had 1, 2 or 3 HbA1c values done per year

sort cpr_number new_diff_days

*This command below is not working.
keep if measure_incl == 1 | measure_incl > 1, min(new_diff_days)
*invalid syntax
*r(198)

Any sage advice for me? Again, my goal is to only keep the HbA1c values where there is just 1 per year OR the HbA1c value with the lowest number of days (new_diff_days) between the date of the HbA1c test and the participant’s birthday.