Hi there
I am hoping someone can advise me on this complex dataset that is derived from a dual frame complex sample design that is provided by the IRS and conducted by the Fed Reserve, line can be found at https://www.federalreserve.gov/econres/scfindex.htm. Because of the large missing data, the Feds imputed replacement values for users beforehand and released five replicate datasets that inlaid these multiply-imputed values. Hence, the apparent sample size of approx 28885 is actually only 5777. They provided a replicate weight dataset which I then created an average weight to normalize the population weight to reflect actual sample. The variable x42001 was given as the population weight (proportions representing actual population).
I first generated a new weight variable, nwgt, by dividing x42001 by the product of the average of weights multiplied by 5
*this nwgt variable is the population normalized version of x42001 - these figures are population weighted
gen nwgt=0
replace nwgt=x42001/(22268.03*5)
However, I am having 3 issues:
1. While I am able to reflect the weight for descriptive statistics for categorical variables, I am not able to do so for continuous variables. For e.g. On home ownership, if I tab townhome[iweight=nwgt] , I get N=5777 which is what I wanted. But when I tabstat age[iweight=nwgt], stat(count mean sd p50 min max) an error message appeared: iweights not allowed
2. there is also another variable weights in the dataset since the data oversampled the wealthy and white population. aweight=wgt and I want to showcase weighted vs unweighted dataset so how could I combine both iweight and aweight in the same line of command?
3. When I run the analysis, in this case, multinomial, I can't seem to use the iweights command either. mlogit risktol age i.gender i.townhome [iweight=nwgt]
Would be grateful for advice.
thank you.
Yours truly
LG
0 Response to Reducing repeated responses to find actual sample size from a multiple imputed dataset
Post a Comment