Hi everyone!

I currently am cleaning a very big dataset (52 variables, 82284 observations) for longitudinal analysis. The dataset is based on information returned from 6 different surveys. I have converted the dataset to long format so currently there are about 6 different observations (in years) for each ID. There are approximately 13,000 unique ID variables. This dataset is confidential so I have created a fake example dataset to use for this question (hopefully inserted correctly below).

So this is my issue - I have tried to create a "death after this wave" variable - which would indicate that this was the last wave of data from the person before dying. Therefore, I need to delete the waves that the person didn't participate in (so if someone only participated in three waves and then died == then only have 3 rows of data, whereas someone who was alive for all waves, will have 6 rows of data), however I am struggling to find a code that will achieve this. Does anyone have any ideas? Apologies, I am quite a novice!

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double idalias int year float(wave_sg Death_After_This_Wave)
1 1901 0 .
1 1904 1 .
1 1907 2 1
1 1910 3 .
1 1913 4 .
1 1916 5 .
2 1901 0 .
2 1904 1 .
2 1907 2 .
2 1910 3 .
2 1913 4 1
2 1916 5 .
3 1901 0 .
3 1904 1 .
3 1907 2 1
3 1910 3 .
3 1913 4 .
3 1916 5 .
4 1901 0 .
4 1904 1 .
4 1907 2 .
4 1910 3 .
4 1913 4 .
4 1916 5 1
5 1901 0 .
5 1904 1 .
5 1907 2 1
5 1910 3 .
5 1913 4 .
5 1916 5 .
6 1901 0 .
6 1904 1 .
6 1907 2 .
6 1910 3 .
6 1913 4 1
6 1916 5 .
7 1901 0 .
7 1904 1 1
7 1907 2 .
7 1910 3 .
7 1913 4 .
7 1916 5 .
end

I was thinking something like this: by idalias, sort: drop in 2/5 if _n=1 for Death_After_This_Wave (which to me means: for each ID, drop the years 1904 1907 1910 1913 1916 (i.e. observations 2 to 5) if the person has died just after the first observation (1901). I could then just edit this code and repeat it for the remaining years.


Thanks for taking the time to read my query.
Warm regards,
Sarah