Hi Stata Forum,

I have a long dataset of hospitalization data. I am looking at number of lower limb amputations, by year.
For each year, I want to select only the highest level of amputation per person.

For example, if one individual (user ID=3) has four amputations in total between 2000 and 2015 (one in 2000, one in 2003, and then two in 2005) I want to include the amputation in 2000, the one in 2003, and then only the highest amputation in 2005. So, if in 2005, they had a toe AND a foot amputation, i only want to include the foot amputation.

I can sort the data (code below), so that i can take the FIRST amputation for each year (taking simply dup==1). But i cannot work out how to take the HIGHEST amputation for each year.

sort id date _year
gen dup=1
replace dup = dup[_n-1]+1 if id==id[_n-1] & _year==_year[_n-1]


*each amputation is coded as no(=0) or year(=1)

I apologize for the table below (but i am having difficulty installing the ado file (directory issues) to do it any other way).

id date year toe foot ankle leg dup
3 03feb2000 2000 1 0 0 0 1
3 06jun2003 2003 1 0 0 0 1
3 02jan2005 2005 1 0 0 0 1
3 21feb2005 2005 0 1 0 0 2
4 30nov2008 2008 0 1 0 0 1
5 12aug2000 2000 1 0 0 0 1
5 11sept2010 2010 1 0 0 0 1
5 19sept2010 2010 0 1 0 0 2
5 8oct2010 2010 0 0 0 1 3

Any suggestions?

Many thanks
Jess