Dear all,

I am trying to find the outcomes of the most recent and second most recent events (in the past) for each observation within a group and record those outcomes in two separate columns. The data set is an unbalanced yearly panel dataset (with gaps for some groups). Below is a simple example showing just one group.

group_id year inspection violation
1 2010 0 0
1 2011 1 1
1 2012 0 0
1 2013 1 0
1 2014 0 0
1 2015 1 1
1 2016 0 0
1 2017 0 0
1 2018 0 0
1 2019 0 0
1 2020 1 0

Here, an event corresponds to an "inspection" and the outcome of an event corresponds to "violation." Both the inspection and violation variables are binary - that is, "inspection" is 1 if there was an inspection in a given year and 0 otherwise; "violation" is 1 if the inspection detected any violation and 0 otherwise. So, when there is no inspection in a given year (i.e., when the value of "inspection" is 0), the value of "violation" is always 0, and when there is an inspection in a given year (i.e., when the value of "inspection" is 1), the value of "violation" could be either 0 or 1.

Now, what I would like to achieve is shown in the following table where two new columns are added.

group_id year inspection violation most_recent_insp_viol second_most_recent_insp_viol
1 2010 0 0 . .
1 2011 1 1 . .
1 2012 0 0 1 .
1 2013 1 0 1 .
1 2014 0 0 0 1
1 2015 1 1 0 1
1 2016 0 0 1 0
1 2017 0 0 1 0
1 2018 0 0 1 0
1 2019 0 0 1 0
1 2020 1 0 1 0

For each observation, "most_recent_insp_viol" shows whether there was a violation in the most recent inspection in the past, and "second_most_recent_insp_viol" shows whether there was a violation in the second most recent inspection in the past.

For 2010 and 2011, there was no inspection in the past, so the values of "most_recent_insp_viol" and "second_most_recent_insp_viol" are missing. For 2012 and 2013, there is one inspection in the past (which occurred in 2011) and the inspection found a violation, so the value of "most_recent_insp_viol" is 1. But the 2012 and 2013 instances do not have the second most recent inspection, so the value of "second_most_recent_insp_viol" is missing. For 2014 and 2015, the most recent inspection occurred in 2013 which did not find a violation, so the value of "most_recent_insp_viol" is 0; the second most recent inspection occurred in 2011 which found a violation, so the value of "second_most_recent_insp_viol" is 1. The logic is the same for 2016 ~ 2020.

I think I was able to create the "most_recent_insp_viol" column, using the code below:

Code:
gen current_viol = violation if inspection == 1
bysort group_id: gen lagged_viol = current_viol[_n-1] // lag by 1 year
bysort group_id: carryforward lagged_viol, gen(most_recent_insp_viol)

But I cannot figure out how to create the "second_most_recent_insp_viol" column. Your help would be greatly appreciated. Thank you so much in advance.

Best regards,
IJ