Dear all,

I am working with nursing home data, and would greatly appreciate some advice on how to efficiently convert data on resident stays into data on facility occupancy.
To elaborate, I currently have data at the stay level, so that each observation contains information on when a resident started and ended her stay at a facility, as well as the id of the facility. I would like to convert this data into a (balanced) facility-day panel dataset with information on facility occupancy -- in particular, each observation corresponds to a particular facility and date, and should tell us how many residents are staying in that facility (on that particular date).

Hopefully, this toy example with 3 residents and 2 facilities will help. This is the data I am starting with, where the first column is a resident id, the second is the stay number, the third is the facility id for the stay, and the fourth and fifth columns indicate the start and end dates of stay respectively.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(resident_id stay_number facility_id stay_start stay_end)
1 1 1 22305 22307
1 2 2 22309 22310
2 1 2 22306 22310
3 1 1 22307 22309
end
I want to convert this into a balanced panel with for the date range from 22307 to 22309. So, the desired output is:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(facility_id occupancy my_date)
1 2 22307
1 1 22308
1 1 22309
2 1 22307
2 1 22308
2 2 22309
end
The first observation of this final dataset is saying that there are 2 residents in facility 1 on the date 22307, the second observation says there is 1 resident in facility 1 on the date 22308, and so on.

In practice, I may be working with hundred of facility and many thousands of residents and stays, so if there are additional tips on how to code this efficiently, that would be great.

Thanks so much for your help!

--
Best wishes,
Alden