Hi,

I want to power a household-randomised, two-arm, post-exposure prophylaxis (PEP) trial where the primary outcome is proportion positive by day 14. We will identify a confirmed positive index-case & enroll all members of the household to either the standard of care (SOC) or intervention arm. Only houses with at least 2 household contacts will be enrolled (excluding the index case); individuals will be randomised at the household level & analysis will be at the individual level.

We estimate a control incidence of 15% & a minimum effect size of 50% (ie, an expected intervention proportion of 7.5%). The ICC between household members/clusters is expected to be 0.2. I use the user-written package, clustersampsi (findit clustersampsi), with the following command (power 0.8 & alpha 0.05):

Code:
clustersampsi, binomial samplesize p1(0.15) p2(0.075) m(2) rho(0.2) alpha(0.05) beta(0.8)
This suggests I need a minimum of 166 households (332 individuals) per arm.

Is there a way to account for fact that some households/clusters will have more than 2 households? I try using the "size_cv" option, but can't find a way to set that the minimum will be 2 & the variation will only be greater. Alternatively, I can just run the command again replacing "m(3)" to determine the number of clusters required at higher levels.

I have tried writing a code to simulate the results (using previous examples on statalist), but have only managed to get as far as individual randomisation - I'm not sure how to incorporate the clusters & ICC - particularly with variable cluster sizes (say for example we expect 75% to have 2 members, 20% to have 3, & 5% to have 4). Does anyone have any advice on how to do this?

Code:
version 14.2
clear
set seed `=strreverse("123456789")'
cap program drop rctpower
program define rctpower, rclass
    
    version 14.2
    syntax , n1(integer)
    drop _all
    
    // simulate proportions
    quietly set obs `n1'
    generate byte trt = mod(_n, 2)
    generate double event = uniform() <0.15 if trt==0
    replace event = uniform() <0.075 if trt==1
    
    // test 
    tab trt event, row chi
        return scalar pval = r(p) < 0.05
       
end

local n1 = 556
quietly simulate pval = r(pval), reps(2000) nodots: rctpower , ///
    n1(`n1')    
summarize pval, meanonly    
display in smcl as text "Power = " as result %04.2f r(mean)

Thank you for any help!