Dear Statalist community,

I am working with a panel data set on armed conflict (UCDP/PRIO Armed Conflict Dataset V21.1). I am trying to find a way to generate two new variables marking the start and end date of all included conflicts. There originally already exist two variables that come close to what I need: start_date2 marks the first day of conflict for each conflict, ep_end_date marks the last day (see data example below). If a conflict is continuing in the next year, ep_end_date is left empty.

Code:
clear
conflictID year str10(start_date2 ep_end_date)
5 2011 "2011-08-20" ""          
5 2012 "2011-08-20" ""          
5 2013 "2011-08-20" ""          
5 2014 "2011-08-20" ""          
5 2015 "2011-08-20" ""          
5 2016 "2011-08-20" ""          
5 2017 "2011-08-20" ""          
5 2018 "2011-08-20" ""          
5 2019 "2011-08-20" ""          
5 2020 "2011-08-20" "2020-12-31"
end

Instead of repeating the conflict's start day for every year, I need a variable that keeps the initial start date for the starting year (here 2011-08-20 in 2011) and takes the first day of every consecutive year in the following observations. In the example above, the new variables should have the same date as start_date2 in the first year and 2012-01-01 in the second, 2013-01-01 in the third etc. Accordingly, the new end date variable should take the value of the last day of each year except for the observations that already have an end date, e.g. 2012-12-31, 2013-12-31 etc. I want to use the new start/end date variables for a Cox-ph model including time-varying covariates that are available on yearly basis.

I am grateful for any advice!

Best,
Carlo
Stata Version 17.0