I want to create a DHS-style wealth index using Principal Component Analysis. I have 27 binary variables capturing housing characteristics (example: house roof material). Following guidance from various Youtube videos and articles, I used the following codes. PCA is carried separately for rural and urban areas (variable: region). Any advice on the accuracy of my method will be appreciated.



Code:
global xlist dwelling_type surrounding ventilation yard_size separate_kitchen house_size house_floor house_walls house_roof electricity drinking_water water_source toilet drainage garbage refrigerator cooking_source television farmland plot_numbers land poultry livestock vehicles appliance savings jewelry
describe $xlist
sum $xlist
corr $xlist

sort region
bysort region: pca $xlist, mineigen(1)
rotate

predict rural_wi if region == 1
predict urban_wi if region == 0
drop if rural_wi ==. & urban_wi ==.

gen wealth_index_score = rural_wi
replace wealth_index_score = urban_wi if wealth_index_score ==.
xtile wealth_index= wealth_index_score, nq(5) // quintiles

Following this I get the following distribution of wealth index:

Code:
tab wealth_index
5 quantiles |
of |
wealth_inde |
x_score | Freq. Percent Cum.
------------+-----------------------------------
1 | 3,030 20.00 20.00
2 | 3,030 20.00 40.01
3 | 3,029 20.00 60.00
4 | 3,030 20.00 80.00
5 | 3,029 20.00 100.00
------------+-----------------------------------
Total | 15,148 100.00