I have a long dataset of hospitalization data. I am looking at number of lower limb amputations, by year.
For each year, I want to select only the highest level of amputation per person.
For example, if one individual (user ID=3) has four amputations in total between 2000 and 2015 (one in 2000, one in 2003, and then two in 2005) I want to include the amputation in 2000, the one in 2003, and then only the highest amputation in 2005. So, if in 2005, they had a toe AND a foot amputation, i only want to include the foot amputation.
I can sort the data (code below), so that i can take the FIRST amputation for each year (taking simply dup==1). But i cannot work out how to take the HIGHEST amputation for each year.
sort id date _year
gen dup=1
replace dup = dup[_n-1]+1 if id==id[_n-1] & _year==_year[_n-1]
*each amputation is coded as no(=0) or year(=1)
I apologize for the table below (but i am having difficulty installing the ado file (directory issues) to do it any other way).
id | date | year | toe | foot | ankle | leg | dup |
3 | 03feb2000 | 2000 | 1 | 0 | 0 | 0 | 1 |
3 | 06jun2003 | 2003 | 1 | 0 | 0 | 0 | 1 |
3 | 02jan2005 | 2005 | 1 | 0 | 0 | 0 | 1 |
3 | 21feb2005 | 2005 | 0 | 1 | 0 | 0 | 2 |
4 | 30nov2008 | 2008 | 0 | 1 | 0 | 0 | 1 |
5 | 12aug2000 | 2000 | 1 | 0 | 0 | 0 | 1 |
5 | 11sept2010 | 2010 | 1 | 0 | 0 | 0 | 1 |
5 | 19sept2010 | 2010 | 0 | 1 | 0 | 0 | 2 |
5 | 8oct2010 | 2010 | 0 | 0 | 0 | 1 | 3 |
Any suggestions?
Many thanks
Jess
0 Response to sorting data
Post a Comment