Hello,

I have been provided with a dataset that contains a string variable 'term' which details the date range of a term duration.

Within the string is a start date (i.e ..period from DD/MM/YYYY) and end date (i.e. ..to DD/MM/YYYY).

I would like to create two variables "date_start" and "date_end" from the original variable provided. Is there a solution to extract each of these two dates noting that three dates are provided within the one string? I have attempted using regexs(1) but have not been able to successfully achieve the outcome.

A small sample of the data is provided here:


Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long id str100 term float(date_start date_end)
 75 "The term of this agreement is for the period from 01-01-2020 to 31-12-2020 commencing on 01-01-2020." . .
 96 "The term of this agreement is for the period from 20-04-2015 to 29-04-2016 commencing on 20-04-2015"  . .
117 "The term of this agreement is for the period from 01-01-2020 to 31-12-2020 commencing on 01-01-2020." . .
117 "The term of this agreement is for the period from 01-01-2018 to 31-12-2018 commencing on 01-01-2018"  . .
117 "The term of this agreement is for the period from 01-01-2017 to 31-12-2017 commencing on 01-01-2017"  . .
138 "The term of this agreement is for the period from 30-04-2020 to 30-04-2021 commencing on 30-04-2020." . .
138 "The term of this agreement is for the period from 30-04-2020 to 30-04-2021 commencing on 30-04-2020." . .
159 "The term of this agreement is for the period from 01-01-2018 to 31-12-2018 commencing on 01-01-2018"  . .
180 "The term of this agreement is for the period from 30-04-2020 to 30-04-2021 commencing on 30-04-2020." . .
201 "The term of this agreement is for the period from 01-01-2020 to 31-12-2020 commencing on 01-01-2020." . .
end
label values id id

I am using Stata 16.

Thanks in advance,
Liam