Here's the basic set up. I'm assuming a total sample of 100 clusters (50 in each group), with 25 observations per cluster. The degree of clustering (intraclass correlation) is 0.16. The sample std deviation is 1 and the difference between groups is 0.2.
Here are the results when I use Stata's power twomeans command:
Code:
. power twomeans 0 .2, cluster k1(50) k2(50) m1(25) m2(25) rho(0.16) sd(1) Estimated power for a two-sample means test Cluster randomized design, z test assuming sd1 = sd2 = sd Ho: m2 = m1 versus Ha: m2 != m1 Study parameters: alpha = 0.0500 delta = 0.2000 m1 = 0.0000 m2 = 0.2000 sd = 1.0000 Cluster design: K1 = 50 K2 = 50 M1 = 25 M2 = 25 N1 = 1,250 N2 = 1,250 rho = 0.1600 Estimated power: power = 0.6228
Code:
. program define myprog, rclass 1. clear 2. set obs 100 // number of clusters 3. gen cluster_num = _n 4. expand 25 // number of observations per cluster 5. local rho = 0.16 // intraclass correlation = .16 6. local sd_u = sqrt(`rho') 7. local sd_e = sqrt(1-`rho') 8. by cluster_num, sort: gen trial_num = _n 9. gen x = runiformint(0,1) // assigning clusters to one of two groups 10. by cluster_num (trial_num), sort: gen u = rnormal(0, `sd_u') if _n == 1 11. by cluster_num (trial_num): replace u = u[1] 12. gen e = rnormal(0, `sd_e') 13. gen y = (.2 * x) + u + e // generating outcome for each grou (group1: M = 0) (group2: M = 0.2) 14. sum y 15. return scalar sd = r(sd) // recording SD 16. regress y x, cluster(cluster_num) 17. lincom x 18. return scalar p = r(p) // recording p-value 19. end . simulate sd=r(sd) p=r(p), reps(10000) nodots: myprog // running simulation command: myprog sd: r(sd) p: r(p) . gen power = (p <= .05) // tabulating percentage of times p <= .05 . . // checking that SD of y is approx 1 . sum sd Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- sd | 10,000 1.004122 .0179245 .9414742 1.073392 . . // estimating power (alpha = .05) . sum power Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- power | 10,000 .9984 .03997 0 1
In sum, when I use Stata's power command I get an estimated power of about 62%, but when I try to simulate the same results I get an estimated power north of 99%. I'm trying to figure out where I'm going wrong here.
0 Response to power analysis with clustered data - simulated vs calculated
Post a Comment