I am working with a large longitudinal data set consisting of anonymous individuals with diabetes between 2010-2019. I want to calculate the annual prevalence of kidney failure in these individuals with diabetes. Then eventually graph the annual trend over time (2010-2019).

My understanding of calculating period prevalence = # new & pre-existing cases of renal failure and diabetes in year x / # of individuals with diabetes in year x

I have sample dataset below:

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input float id byte gender int(startdate exitdate) byte outcome int outcome_date
 1 0 16888 20832 0     .
 2 0 17601 21796 0     .
 3 0 17231 21817 0     .
 4 0 14959 19435 1 19435
 5 1 15008 21790 0     .
 6 0 15908 17081 0     .
 7 1 17804 20579 0     .
 8 0 16026 19235 1 19235
 9 1 20261 21816 0     .
10 1 15139 16902 1 16902
11 1 14830 17665 1 17665
12 1 14675 15113 1 15113
13 1 18763 21796 0     .
14 0 15988 21816 0     .
15 1 14643 15536 1 15536
16 0 16323 20362 1 20362
17 0 14739 17398 0     .
18 1 18169 18920 0     .
19 1 17720 20215 0     .
20 1 14802 15257 1 15257
end
format %td startdate
format %td exitdate
format %td outcome_date
with:
'gender': 1=male, 0=female
'startdate' = date of diabetes diagnosis
'exitdate' = date exited study
'outcome': renal failure=1, no renal failure=0
'outcome_date' = date of renal failure diagnosis

I have been able to clean my large dataset down to the above variables, but have not been able to identify suitable syntax to calculate annual prevalence.

I would very grateful for any help.

Thank you