Hi, I'd been trying to set up a simple bootstrap that involves a small code that I'd written and I was noticing something odd--the values that should not vary across random samples were coming out with standard errors. Puzzled, I wrote a mock program to get to the bottom of this and realized that cluster option with bsample was causing something strange in the output:
The raw data has about 400 obs in 4 groups ("forms") and 5 obs per caseid, with all 5 obs for the same caseid being assigned to the same "form." My test program looks like the following:
program sim1
preserve
bsample 10, cluster(caseid) strata(form) idcluster(s_id)
ttest correct if form=="A"|form=="B", by(form)
scalar n1 = r(N_1)
scalar n2 = r(N_2)
scalar p1 = r(p)
ttest correct if form=="C"|form=="D", by(form)
scalar n3 = r(N_1)
scalar n4 = r(N_2)
scalar p2 = r(p)
restore
end
So, this should produce a random sample with 200 randomly drawn obs, 5 for each cluster, 10 clusters for each form. If the program is not run as part of the bootstrap command, nothing unexpected happens--I've used forval loops to generate up to 100 random samples using this very program and found that the samples generated, do, in fact have appropriate balances. As it should be the case, n1-n4 are all 50.
But once this is incorporated as part of the bootstrap command, as follows, something odd happens:
bootstrap n1 = n1 n2 = n2 n3 = n3 n4=n4 p1 = p1 p2=p2 ////
, rep(1000) saving(testset, replace): sim1
once the bootstrap is done and I open the testset.dta file, the simulated n1-n4 are not uniformly 50. For instance:
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
n1 | 1,000 50.56 8.058867 24 80
n2 | 1,000 50.161 8.800371 25 92
n3 | 1,000 50.476 8.260973 27 79
n4 | 1,000 50.095 7.672995 29 81
This seems to take place only when using bootstrap + bsample with cluster option: when I'm using only strata option, no strange sample sizes are reported (and, as noted previously, I don't think random samples created actually have unusual sample sizes--I've actually created forval loop and this very program to generate 100 random samples manually and nothing of the sort with any of the random samples.). So this seems to be bootstrap generating, eh, strange stats that are not very grounded on the actual random samples being generated. Where are these numbers coming from, why is Stata doing this, and what does it mean for other stats it is reporting, and what can I do to get Stata to report proper numbers, short of generating 1000s of random samples manually (I suppose I can use simulate to do this as well, but I am curious as to what exactly cluster option does that produces these numbers in this context)? Once again, I note that this seems to take place ONLY with cluster option specified. (Using Stata 15, in case it is relevant).
Thank you so much in advance!
Related Posts with using bsample, cluster in conjunction with bootstrap (Stata 15)
getdata command EUROSTATDear all, Do someone have experience with the command getdata? My wish is to import data from EUROS…
Create a scatterplot using dummy variablesHi. I would like to create a scatterplot using two dummy variables, however, as you may guess, simpl…
Help for analysis method in field trialHi My study is about finding the most effective intervention to reduce the risk factors of non-comm…
Calculate mean of each quantile/decile and get output in excel formatDear, Greetings. I like to calculate mean of each quantile on each date and take the output into a …
OLS with important omitted variable due to collinearityHi Stata Forum, I need theoretical and coding help. (I am a beginner in data analysis, so if some t…
Subscribe to:
Post Comments (Atom)
0 Response to using bsample, cluster in conjunction with bootstrap (Stata 15)
Post a Comment