I'm working with individual-level pooled cross-sectional National Health Interview Survey (NHIS) survey data across years 1981-2014. Since a lot of the questions in the survey fall under the same broader categories (e.g. physical health, mental health, health care/insurance, etc.), I'd like to group them into indices. Following Thompson (2018), for each index I standardize the components to have a mean of zero and a standard deviation of one. I then create weights equal to the inverse of the sample covariance and use them to weight the mean of the standardized components. Here's a data example of what the standardized components look like:
Code:
input float(aeffortzscore ahopelesszscore anervouszscore arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore) -.5429977 .3381291 .6184267 -.4203467 -.6699646 .29255468 . .4963797 .3381291 .6184267 -2.4615376 .51620674 .29255468 1.0767654 -.5429977 .3381291 .6184267 .6002487 .51620674 .29255468 . -3.66113 .3381291 -3.675016 -3.482133 .51620674 .29255468 .1148366 -1.582375 .3381291 .6184267 .6002487 -.6699646 .29255468 .1148366 .4963797 .3381291 .6184267 .6002487 .51620674 .29255468 . .4963797 .3381291 .6184267 -.4203467 -.6699646 .29255468 . .4963797 .3381291 .6184267 .6002487 .51620674 .29255468 . .4963797 .3381291 -1.5282946 -1.440942 -.6699646 .29255468 1.0767654 -1.582375 -2.590604 -1.5282946 -1.440942 -.6699646 .29255468 .1148366 .4963797 .3381291 .6184267 .6002487 .51620674 .29255468 . -.5429977 -1.1262374 -.4549339 -1.440942 -1.856136 .29255468 -.8470922 -1.582375 -2.590604 .6184267 .6002487 -.6699646 .29255468 -.8470922
Code:
*create mental health index *standardize components #delimit ; foreach var in aeffort ahopeless anervous arestless asad aworthless feelings_interfered { ; egen `var'zscore=std(`var') ; } ; matrix drop _all ; *calculate weights ; local mental_health_vars "aeffortzscore ahopelesszscore anervouszscore arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore" ; corr `mental_health_vars' ; #delimit cr mat sigma=r(C) foreach n of numlist 1/7 { mat c_`n' = sigma[`n',1..7] mat XX = c_`n' svmat XX scalar w`n' = XX1+ XX2 + XX3 + XX4 + XX5 + XX6 + XX7 drop XX* } *weight outcomes local num = 1 foreach var in `mental_health_vars' { g tmp`num' = `var'*w`num' local num = `num' + 1 } *take mean of weighted outcomes egen tmpcomp = rowtotal(tmp1 tmp2 tmp3 tmp4 tmp5 tmp6 tmp7), mis gen W=w1+w2+w3+w4+w5+w6+w7 replace tmpcomp=tmpcomp/W *restandardize egen mental_health_index = std(tmpcomp) replace mental_health_index = round(mental_health_index,.02) label var mental_health_index "mental health index" capture drop tmp* W
I would appreciate any advice for how to verify whether the above code appropriately adjusts for missing values in the denominator of the weighted average index. I'm using Stata 15.1.
Thanks,
Keanan
0 Response to Adjusting for missing observations in a weighted standardized index
Post a Comment