Dear Statalisters,
my first Post is a question concerning different confidence intervals with specified survey design using different commands. My original question was how i could test proportions of an illness in different subgroups against the propotion in the whole population.
As a proprotion of a dummy variable coded with zeros and ones is the same as the mean I thought I could use the same approach I used to test means in subgroups against the population mean.
In other words I wanted to do this:
Code:
use https://www.stata-press.com/data/r17/nhefs svyset psu2 [pw=swgt2], strata(strata2) svy: mean rural svy: mean rural, over(region) svy: reg rural i.region contrast gw.region, effects mcompare(bonferroni)
All looked good to me, however when I started to compare the Confidence Intervals with other commands giving me proportions I noticed differences.
Code:
svy: mean rural svy: tab rural, ci quietly: svy: reg rural i.region margins
Code:
. svy: mean rural (running mean on estimation sample) Survey: Mean estimation Number of strata = 35 Number of obs = 14,407 Number of PSUs = 105 Population size = 212,619,074 Design df = 70 -------------------------------------------------------------- | Linearized | Mean std. err. [95% conf. interval] -------------+------------------------------------------------ rural | .3181015 .0185216 .2811612 .3550417 -------------------------------------------------------------- . svy: tab rural, ci (running tabulate on estimation sample) Number of strata = 35 Number of obs = 14,407 Number of PSUs = 105 Population size = 212,619,074 Design df = 70 ---------------------------------------------- rural | residence | proportion lb ub ----------+----------------------------------- 0 | .6819 .6439 .7176 1 | .3181 .2824 .3561 | Total | 1 ---------------------------------------------- Key: proportion = Cell proportion lb = Lower 95% confidence bound for cell proportion ub = Upper 95% confidence bound for cell proportion . quietly: svy: reg rural i.region . margins Predictive margins Number of strata = 35 Number of obs = 14,407 Number of PSUs = 105 Population size = 212,619,074 Model VCE: Linearized Design df = 70 Expression: Linear prediction, predict() ------------------------------------------------------------------------------ | Delta-method | Margin std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- _cons | .3181015 .0188586 16.87 0.000 .2804892 .3557137 ------------------------------------------------------------------------------
So now I have two questions:
1. Why do the confidence intervals differ in R it is common practice to calculate the proportion by calculating means of dummy variables? Is one method the "right one"?
2. Taking in consideration that these methods result in different confidence intervals, can I still use the method above to test the proportions in the subgroups against the proportions in the population? (but maybe I should open another one for this one)
Thank you very much for your help!
Stephan
0 Response to Different CIs with specified survey design using svy: mean and svy: tab
Post a Comment