Hi All,
Background:
I am working with a client who phone-based information services and they would like to understand the experiences of platform users via phone interviews. The population of users in the platform is 41,103. The sample is drawn stratifying by user type (nonactive, active, hyper-active) as well as the geographical variable where the user is registered. After the sample was drawn, I discovered that the average activity level among the active users was lower than in the population, so I constructed the post-stratification weights in addition to sampling weights. The post stratification weight includes both the adjustment to reflect distribution of users in the population and non-response that we experienced during interviewing. The post-stratification adjustments were constructed within the original stratification category.
Post-stratification weights:
Post_stratification_adjustment=number of people in the population within user-type+province/number of people who were successfully reached during phone interviews within user-type+province. The post_stratification adjustment adds up to the population count.
Estimating means:
Some of the variables for which I am estimating means are conditional on binary variables variables and, thus, have missing values by definition. (e.g. total amount of seed bought only applies to those who bought the seed). I am interested in estimating conditional means, and subpop seems to be right option, since the mean is estimated for a sub-population for which a particular variable applies. I am estimating means in the following 4 ways and, expect them to be the same. Contrary to expectation, the mean obtained from conditioning (estimation 1 and 2 below) is different from the mean obtained using over and sub-population commands (3 and 4 below). However, the means are the same across different estimations in the absence of post-stratification weights, so it seems that the differences are driven by that.
Questions:
1: Should the "over", "subpop" and "if" commands produce different means by design? I was under the impression that only standard errors are affected.
2: Which command is appropriate for estimating means and standard errors in the presence of post-stratification weights?
3: How to estimate the standard deviation of the variable in the presence of post-stratification weights. "estat sd is not appropriate with estimation results that used direct standardization or poststratification."
Thank you very much for your help and please let me know if anything should be clarified.
Code + Output:
svyset id, strata(strata) weight(prob_selection_inv) poststrata(active_tot_q_province_response) postweight(active_tot_q_province_N)
replace seed_total=. if seed_total==0
. tabmiss seed_total
Variable | Obs Missings Feq.Missings NonMiss Feq.NonMiss
-------------+---------------------------------------------------------------
seed_total | 449 168 37.42 281 62.58
gen dum=!missing(seed_total)
(1) svy: mean seed_total if dum==1
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 10 Number of obs = 281
Number of PSUs = 281 Population size = 39,477
N. of poststrata = 67 Design df = 271
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
seed_total | 74.71668 14.54314 46.08477 103.3486
(2) svy: mean seed_total
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 10 Number of obs = 281
Number of PSUs = 281 Population size = 39,477
N. of poststrata = 67 Design df = 271
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
seed_total | 74.71668 14.54314 46.08477 103.3486
--------------------------------------------------------------
(3) . svy, over(dum): mean seed_total
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 10 Number of obs = 281
Number of PSUs = 281 Population size = 39,477
N. of poststrata = 67 Design df = 271
1: dum = 1
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
seed_total |
1 | 74.71668 14.54314 46.08477 103.3486
--------------------------------------------------------------
(4) . svy, subpop(dum): mean seed_total
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 10 Number of obs = 449
Number of PSUs = 449 Population size = 41,103
N. of poststrata = 73 Subpop. no. obs = 281
Subpop. size = 21,523.9
Design df = 439
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
seed_total | 65.12752 12.31063 40.93242 89.32262
--------------------------------------------------------------
Related Posts with "Over", "subpop" and "if" options produce different means in svy: means, in the presence of post-stratification weights
Store matrix as r()Hello, Is there a way to store a hand-written matrix as if it is a result of a comment (those that a…
marginal effect not consistent with odd ratioHi. I am trying to examine the likelihood of migrants live in different regions. my reference catego…
How to check if value labels are consistently defined across multiple data sets using a loop?Hi, I'm trying to append some 35 data sets, one for each state. I used a loop to append them and sa…
psmatch2 and matching treatment with two different yearsI would greatly appreciate your advice regarding using psmatch2 for matching treatment from two diff…
CI 95% Graph mixed model predictionI can't put IC95 on this graph. Can you help me, please? Thank you very much twoway connected ava t…
Subscribe to:
Post Comments (Atom)
0 Response to "Over", "subpop" and "if" options produce different means in svy: means, in the presence of post-stratification weights
Post a Comment