Thank you for your help in advance! I am quite desperate at this moment as this should be rather elementary but I couldn't seem to figure it out...
I am using stata 15. I have a clustering dataset with household - community - street - district for a multilevel analysis. There are 5482 households in 837 communities clustered in 227 districts.
I am trying to do the following before the analysis:
- group those communities with less than 5 households into one so that each cluster will have at least 5 households;
- I would like the households that got grouped together to be in the same street (if the enough household is found in the street) or the same district.
Code:
bysort district: list street community hid
+-----------------------------------+
| street community hid |
|-----------------------------------|
1. | 中山路街道 梅台社区 6470518 |
2. | 中山路街道 梅台社区 6470517 |
3. | 中山路街道 江汉社区 6470214 |
4. | 中山路街道 健康社区 6470118 |
5. | 中山路街道 江汉社区 6470215 |
|-----------------------------------|
6. | 解放路 九曲桥社区 6480515 |
7. | 中山路街道 梅台社区 6470515 |
8. | 解放路 武德社区 6480315 |
9. | 中山路街道 文化坊社区 6470416 |
10. | 中山路 黄家塘社区 6470316 |
|-----------------------------------|
11. | 中山路街道 江汉社区 6470217 |
12. | 中山路街道 江汉社区 6470219 |
13. | 中山路街道 文化坊社区 6470414 |
14. | 中山路街道 健康社区 6470117 |
15. | 解放路 十方庵社区 6480216 |
|-----------------------------------|
16. | 中山路街道 江汉社区 6470216 |
17. | 解放据街道 十方庵社区 6480214 |
18. | 解放路 九曲桥社区 6480514 |
19. | 中山路街道 梅台社区 6470516 |
20. | 中山路街道 健康社区 6470115 |
|-----------------------------------|
21. | 中山路街道 文化坊社区 6470415 |
22. | 中山路街道 健康社区 6470116 |
+-----------------------------------+
-------------------------------------------------------------------------------------
-> district = 荆州市沙市县
+-----------------------------------+
| street community hid |
|-----------------------------------|
1. | 解放路 九曲桥社区 6480516 |
2. | 解放路 北湖社区 6480116 |
3. | 解放路街道 武德社区 6480317 |
4. | 解放路 武德社区 6480316 |
5. | 中山路 黄家塘社区 6470314 |
|-----------------------------------|
6. | 解放路 九曲桥社区 6480517 |
7. | 解放路 武德社区 6480314 |
8. | 中山路 黄家塘社区 6470317 |
9. | 中山路 江汉社区 6470218 |
10. | 中山路 健康社区 6470119 |
+-----------------------------------+
-------------------------------------------------------------------------------------
-> district = 荆州市监利
+------------------------------+
| street commun~y hid |
|------------------------------|
1. | 容城 团结 6611106 |
2. | 客城镇 茶庵社区 66110514 |
+------------------------------+
I tried to use
Code:
egen tag = tag(community hid) bysort community: egen N_comm = total(tag) //number of Households in each community /*egen tag2 = tag(street hid) bysort street: egen N_street = total(tag2) //number of observations in each street egen aggcomm = group(street community) if N_comm < 5 //group communities by street when N_comm<5 */ egen tag2 = tag(street community) bysort street: egen N_street = total(tag2) //number of communities in each street egen aggcomm = group(street community) if N_comm < 5 //group communities by street when N_comm<5 gen neighborhood = . //neighborhood ID replace neighborhood = community if N_comm>=5 replace neighborhood = aggcomm if N_comm<5 fre neighborhood egen tag4 = tag(neighborhood hid) bysort neighborhood: egen N = total(tag4)
I hope the explanation makes sense...Let me know if I could clarify anything!!!
Thank you again!!
0 Response to Grouping small communities to avoid small cell problem
Post a Comment