Hello everyone,

I have a dataset reporting diesel price changes for different petrol stations (station_uuid) during one day with the following structure:

Code:
clear
input str36 station_uuid str22 raw_date double formatted_date float diesel
"2471ee14-8beb-455f-942d-73733d462c01" "2021-01-01 07:49:08+01" 1925106548000 1.329
"21a92daf-dec2-4448-b128-7f764b234dbc" "2021-01-01 10:27:17+01" 1925116037000 1.189
"b21f117f-305e-44ee-87cc-7f21fe4f3f58" "2021-01-01 10:42:17+01" 1925116937000 1.319
"096eb876-1888-4ec0-b64d-8d2369a319eb" "2021-01-01 11:04:14+01" 1925118254000 1.199
"25d5e86b-fc8b-479a-434e-3e852f2aefe2" "2021-01-01 11:22:16+01" 1925119336000 1.189
"5e40a39a-f679-480a-aaac-81754a28e003" "2021-01-01 13:17:14+01" 1925126234000 1.169
"65d95b09-e1cb-454f-b3da-49876ad84561" "2021-01-01 14:05:06+01" 1925129106000 1.249
"51d4b70e-a095-1aa0-e100-80009459e03a" "2021-01-01 15:53:07+01" 1925135587000 1.179
"c08111e0-37bb-4c2c-94bb-1e6e1c4bb1f5" "2021-01-01 16:23:07+01" 1925137387000 1.219
"0c469754-2608-4e58-8ab7-6cd924edd5a5" "2021-01-01 16:27:13+01" 1925137633000 1.199
"c1b456c8-b782-41d8-a960-466c4088a463" "2021-01-01 16:38:15+01" 1925138295000 1.199
"ade023de-dad0-40e7-bcc3-0a25e9a85c77" "2021-01-01 16:55:06+01" 1925139306000 1.219
"b457a782-e3d4-4513-a258-c4d3e199d79a" "2021-01-01 17:24:16+01" 1925141056000 1.239
"826b3acc-d800-41ba-9ead-76bc0d1dba20" "2021-01-01 18:22:16+01" 1925144536000 1.259
"812ecef1-650c-4930-a88b-71b9793607e6" "2021-01-01 18:31:07+01" 1925145067000 1.219
"16941d49-7ec6-45fc-aea9-93901f8f2dff" "2021-01-01 20:17:13+01" 1925151433000 1.179
"e1a15081-254f-9107-e040-0b0a3dfe563c" "2021-01-01 20:18:14+01" 1925151494000 1.179
"0cc777e4-13bc-48d4-b161-0f23d56afea6" "2021-01-01 21:12:20+01" 1925154740000 1.109
"e95bfbba-f829-45f7-ac4b-9bcab2fb48ee" "2021-01-01 21:17:13+01" 1925155033000 1.209
"51d4b5ee-a095-1aa0-e100-80009459e03a" "2021-01-01 21:42:15+01" 1925156535000 1.169
end
format %tcDD_Mon_CCYY_HH:MM:SS formatted_date
The variable formatted_date has been generated after studying chapter 25 of the user's guide and using the following command:

Code:
gen double formatted_date = clock(raw_date, "YMDhms#")
Since I want an hourly panel, I thought of collapsing the data on an hourly level forming means if there is more than one price change per hour per station. In a second step, I would then fill down/expand the observations with the price of the previous hour for stations that have not reported a price change in given hour. I imagine the second step to look something like this:

Code:
tsset station_uuid final_date
tsfill
bysort station_uuid: carryforward diesel, gen(diesel_complete)
My questions are:
1) How do I collapse/aggregate on an hourly level?
2) Is the procedure I have in mind correct or is there a better way to do it?

Thank you so much in advance.

Best regards,
Benedikt Franz