I have a question regarding generating averages.
I have one column with a unique Houshold ID for each household in the dataset. I have observations for multiple people in each household and a variable called House Hold Size which is the same for each person in each household.
I also know their region.
I wish to create: a variable which contains for each region, the average household size.
I am aware of egen and using mean() - however since there are multiple observations for each Household - I don't know how I can get Stata to pick just one observation for each household and calculate the mean by region.
If each household had only one person, I would do: bysort Region: egen HHavg = mean(Hh Size)
But since each household has different number of people - this wouldn't really represent the average household size in the Region.
HouseHold Number | Region | Hh SIZE | Person Number |
1 | 1 | 10 | 1 |
1 | 1 | 10 | 2 |
1 | 1 | 10 | 3 |
2 | 1 | 12 | 1 |
2 | 1 | 12 | 2 |
3 | 2 | 15 | 1 |
3 | 2 | 15 | 2 |
3 | 2 | 15 | 3 |
4 | 3 | 12 | 1 |
Thanks
0 Response to Generating means for a region using only one observation per household
Post a Comment