Hi,

I have a question regarding generating averages.
I have one column with a unique Houshold ID for each household in the dataset. I have observations for multiple people in each household and a variable called House Hold Size which is the same for each person in each household.

I also know their region.

I wish to create: a variable which contains for each region, the average household size.
I am aware of egen and using mean() - however since there are multiple observations for each Household - I don't know how I can get Stata to pick just one observation for each household and calculate the mean by region.

If each household had only one person, I would do: bysort Region: egen HHavg = mean(Hh Size)
But since each household has different number of people - this wouldn't really represent the average household size in the Region.
HouseHold Number Region Hh SIZE Person Number
1 1 10 1
1 1 10 2
1 1 10 3
2 1 12 1
2 1 12 2
3 2 15 1
3 2 15 2
3 2 15 3
4 3 12 1
Any help would be very much appreciated.

Thanks