Hi everyone,

I've been struggling with a project, and would appreciate some help.

In brief, I’m working with a large healthcare database. I have to define “hospital volume” for my analysis. The database provides a "Hospital ID" which is specific for each hospital, the database contains several entries (different subjects) from the same "Hospital ID" ("PUF Facility ID"). I’ve used the "egen" command to create a “volume” variable, based on the frequency of the hospital ID.
The data is from 2004-2015. I also have a “year” (Year of Diagnosis) variable. The issue arises in that not all the hospitals contributed data every year, so I can’t just divide my number of hospital entries by 11 years. I have to be able to create a variable with the number of years the specific hospital contributed, and then average out. I’m sure there is a very simple way to do it, I just don’t know how to.

I've copied my code below, along with what it yields:


egen volume=group(PUF_FACILITY_ID), label

Array

Now I have the "year part". If I tabulate

bysort volume: tab YEAR_OF_DIAGNOSIS

I get the following:

Array

As we can see, some institutions have subjects every year, others do not.

What I need to complete this is to be able create a variable identifying how many years did an institution contribute to the data ( ex. 1, 2, 3 years) so I can then divide the volume for each specific institution by the specific number of years.

I'm hope this is clear enough. I sure there is an easy way to do this, likely utilizing a loop.

I would appreciate any help.