Hello everyone,

Thank you for your help in advance! I am quite desperate at this moment as this should be rather elementary but I couldn't seem to figure it out...

I am using stata 15. I have a clustering dataset with household - community - street - district for a multilevel analysis. There are 5482 households in 837 communities clustered in 227 districts.
I am trying to do the following before the analysis:
  1. group those communities with less than 5 households into one so that each cluster will have at least 5 households;
  2. I would like the households that got grouped together to be in the same street (if the enough household is found in the street) or the same district.
Here is a glimpse of the dataset:


Code:
bysort district: list street community hid
-> district = 荆州市沙市区

+-----------------------------------+
| street community hid |
|-----------------------------------|
1. | 中山路街道 梅台社区 6470518 |
2. | 中山路街道 梅台社区 6470517 |
3. | 中山路街道 江汉社区 6470214 |
4. | 中山路街道 健康社区 6470118 |
5. | 中山路街道 江汉社区 6470215 |
|-----------------------------------|
6. | 解放路 九曲桥社区 6480515 |
7. | 中山路街道 梅台社区 6470515 |
8. | 解放路 武德社区 6480315 |
9. | 中山路街道 文化坊社区 6470416 |
10. | 中山路 黄家塘社区 6470316 |
|-----------------------------------|
11. | 中山路街道 江汉社区 6470217 |
12. | 中山路街道 江汉社区 6470219 |
13. | 中山路街道 文化坊社区 6470414 |
14. | 中山路街道 健康社区 6470117 |
15. | 解放路 十方庵社区 6480216 |
|-----------------------------------|
16. | 中山路街道 江汉社区 6470216 |
17. | 解放据街道 十方庵社区 6480214 |
18. | 解放路 九曲桥社区 6480514 |
19. | 中山路街道 梅台社区 6470516 |
20. | 中山路街道 健康社区 6470115 |
|-----------------------------------|
21. | 中山路街道 文化坊社区 6470415 |
22. | 中山路街道 健康社区 6470116 |
+-----------------------------------+

-------------------------------------------------------------------------------------
-> district = 荆州市沙市县

+-----------------------------------+
| street community hid |
|-----------------------------------|
1. | 解放路 九曲桥社区 6480516 |
2. | 解放路 北湖社区 6480116 |
3. | 解放路街道 武德社区 6480317 |
4. | 解放路 武德社区 6480316 |
5. | 中山路 黄家塘社区 6470314 |
|-----------------------------------|
6. | 解放路 九曲桥社区 6480517 |
7. | 解放路 武德社区 6480314 |
8. | 中山路 黄家塘社区 6470317 |
9. | 中山路 江汉社区 6470218 |
10. | 中山路 健康社区 6470119 |
+-----------------------------------+

-------------------------------------------------------------------------------------
-> district = 荆州市监利

+------------------------------+
| street commun~y hid |
|------------------------------|
1. | 容城 团结 6611106 |
2. | 客城镇 茶庵社区 66110514 |
+------------------------------+


I tried to use
Code:
egen tag = tag(community hid)
bysort community:  egen N_comm = total(tag)  //number of Households in each community

/*egen tag2 = tag(street hid)
bysort street:  egen N_street = total(tag2)  //number of observations in each street
egen aggcomm = group(street community) if N_comm < 5  //group communities by street when N_comm<5
*/

egen tag2 = tag(street community)
bysort street:  egen N_street = total(tag2)  //number of communities in each street
egen aggcomm = group(street community) if N_comm < 5  //group communities by street when N_comm<5

gen neighborhood = .     //neighborhood ID
replace neighborhood = community if N_comm>=5
replace neighborhood = aggcomm if N_comm<5  
fre neighborhood  

egen tag4 = tag(neighborhood hid)
bysort neighborhood:  egen N = total(tag4)
But this doesnt seem right. The group function I used here creates unique combinations but not the clusters I wanted..

I hope the explanation makes sense...Let me know if I could clarify anything!!!

Thank you again!!