I'm working with individual-level pooled cross-sectional National Health Interview Survey (NHIS) survey data across years 1981-2014. Since a lot of the questions in the survey fall under the same broader categories (e.g. physical health, mental health, health care/insurance, etc.), I'd like to group them into indices. Following Thompson (2018), for each index I standardize the components to have a mean of zero and a standard deviation of one. I then create weights equal to the inverse of the sample covariance and use them to weight the mean of the standardized components. Here's a data example of what the standardized components look like:
input float(aeffortzscore ahopelesszscore anervouszscore arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore) -.5429977 .3381291 .6184267 -.4203467 -.6699646 .29255468 . .4963797 .3381291 .6184267 -2.4615376 .51620674 .29255468 1.0767654 -.5429977 .3381291 .6184267 .6002487 .51620674 .29255468 . -3.66113 .3381291 -3.675016 -3.482133 .51620674 .29255468 .1148366 -1.582375 .3381291 .6184267 .6002487 -.6699646 .29255468 .1148366 .4963797 .3381291 .6184267 .6002487 .51620674 .29255468 . .4963797 .3381291 .6184267 -.4203467 -.6699646 .29255468 . .4963797 .3381291 .6184267 .6002487 .51620674 .29255468 . .4963797 .3381291 -1.5282946 -1.440942 -.6699646 .29255468 1.0767654 -1.582375 -2.590604 -1.5282946 -1.440942 -.6699646 .29255468 .1148366 .4963797 .3381291 .6184267 .6002487 .51620674 .29255468 . -.5429977 -1.1262374 -.4549339 -1.440942 -1.856136 .29255468 -.8470922 -1.582375 -2.590604 .6184267 .6002487 -.6699646 .29255468 -.8470922
*create mental health index *standardize components #delimit ; foreach var in aeffort ahopeless anervous arestless asad aworthless feelings_interfered { ; egen `var'zscore=std(`var') ; } ; matrix drop _all ; *calculate weights ; local mental_health_vars "aeffortzscore ahopelesszscore anervouszscore arestlesszscore asadzscore aworthlesszscore feelings_interferedzscore" ; corr `mental_health_vars' ; #delimit cr mat sigma=r(C) foreach n of numlist 1/7 { mat c_`n' = sigma[`n',1..7] mat XX = c_`n' svmat XX scalar w`n' = XX1+ XX2 + XX3 + XX4 + XX5 + XX6 + XX7 drop XX* } *weight outcomes local num = 1 foreach var in `mental_health_vars' { g tmp`num' = `var'*w`num' local num = `num' + 1 } *take mean of weighted outcomes egen tmpcomp = rowtotal(tmp1 tmp2 tmp3 tmp4 tmp5 tmp6 tmp7), mis gen W=w1+w2+w3+w4+w5+w6+w7 replace tmpcomp=tmpcomp/W *restandardize egen mental_health_index = std(tmpcomp) replace mental_health_index = round(mental_health_index,.02) label var mental_health_index "mental health index" capture drop tmp* W
I would appreciate any advice for how to verify whether the above code appropriately adjusts for missing values in the denominator of the weighted average index. I'm using Stata 15.1.
0 Response to Adjusting for missing observations in a weighted standardized index
Post a Comment