I am trying to find the outcomes of the most recent and second most recent events (in the past) for each observation within a group and record those outcomes in two separate columns. The data set is an unbalanced yearly panel dataset (with gaps for some groups). Below is a simple example showing just one group.
group_id | year | inspection | violation |
1 | 2010 | 0 | 0 |
1 | 2011 | 1 | 1 |
1 | 2012 | 0 | 0 |
1 | 2013 | 1 | 0 |
1 | 2014 | 0 | 0 |
1 | 2015 | 1 | 1 |
1 | 2016 | 0 | 0 |
1 | 2017 | 0 | 0 |
1 | 2018 | 0 | 0 |
1 | 2019 | 0 | 0 |
1 | 2020 | 1 | 0 |
Here, an event corresponds to an "inspection" and the outcome of an event corresponds to "violation." Both the inspection and violation variables are binary - that is, "inspection" is 1 if there was an inspection in a given year and 0 otherwise; "violation" is 1 if the inspection detected any violation and 0 otherwise. So, when there is no inspection in a given year (i.e., when the value of "inspection" is 0), the value of "violation" is always 0, and when there is an inspection in a given year (i.e., when the value of "inspection" is 1), the value of "violation" could be either 0 or 1.
Now, what I would like to achieve is shown in the following table where two new columns are added.
group_id | year | inspection | violation | most_recent_insp_viol | second_most_recent_insp_viol |
1 | 2010 | 0 | 0 | . | . |
1 | 2011 | 1 | 1 | . | . |
1 | 2012 | 0 | 0 | 1 | . |
1 | 2013 | 1 | 0 | 1 | . |
1 | 2014 | 0 | 0 | 0 | 1 |
1 | 2015 | 1 | 1 | 0 | 1 |
1 | 2016 | 0 | 0 | 1 | 0 |
1 | 2017 | 0 | 0 | 1 | 0 |
1 | 2018 | 0 | 0 | 1 | 0 |
1 | 2019 | 0 | 0 | 1 | 0 |
1 | 2020 | 1 | 0 | 1 | 0 |
For each observation, "most_recent_insp_viol" shows whether there was a violation in the most recent inspection in the past, and "second_most_recent_insp_viol" shows whether there was a violation in the second most recent inspection in the past.
For 2010 and 2011, there was no inspection in the past, so the values of "most_recent_insp_viol" and "second_most_recent_insp_viol" are missing. For 2012 and 2013, there is one inspection in the past (which occurred in 2011) and the inspection found a violation, so the value of "most_recent_insp_viol" is 1. But the 2012 and 2013 instances do not have the second most recent inspection, so the value of "second_most_recent_insp_viol" is missing. For 2014 and 2015, the most recent inspection occurred in 2013 which did not find a violation, so the value of "most_recent_insp_viol" is 0; the second most recent inspection occurred in 2011 which found a violation, so the value of "second_most_recent_insp_viol" is 1. The logic is the same for 2016 ~ 2020.
I think I was able to create the "most_recent_insp_viol" column, using the code below:
Code:
gen current_viol = violation if inspection == 1 bysort group_id: gen lagged_viol = current_viol[_n-1] // lag by 1 year bysort group_id: carryforward lagged_viol, gen(most_recent_insp_viol)
But I cannot figure out how to create the "second_most_recent_insp_viol" column. Your help would be greatly appreciated. Thank you so much in advance.
Best regards,
IJ
0 Response to Find outcome of the most recent and second most recent event within group
Post a Comment