Hi Everyone,

I am really struggling with this issue and hope someone can help.

I have a generalized estimating equation model based on matched pairs (1:3) /clusters. The model is simply the relationship of cost to disease_status (binary variable). I would like an estimate of the change in cost by disease status. The clusters have a pair ID. My goal is to bootstrap the model sampling from clusters and use the margin results to estimate the 95% CI using the percentile method. My code is as follows:

(I have many costs so I used a program - this is just a demonstration of it on one cost)

Code:
capture program drop dydx_margins
program define dydx_margins, eclass
xtgee cost i.disease_status fam(gamma) link(log) corr(independent) i(pairid2)
margins, dydx(disease_status) post
exit
end
xtset, clear
bootsstrap _b, reps(1000) seed(1234) cluster(pairid) idcluster(pairid2): dydx_margins

Someone recently asked me to make sure that different cluster IDs were given to replications of the same cluster. This was to make sure I was not overweighting the case-control differences from those clusters which may have been selected more than once. I thought I fixed this by specifying the idcluster command and making sure I told xtset to clear and refer to that new pair variable in the xtgee.

But to check I ran this code below just removing the idcluster command:

Code:
capture program drop dydx_margins2
program define dydx_margins2, eclass
xtgee cost i.disease_status fam(gamma) link(log) corr(independent) i(pairid)
margins, dydx(disease_status) post
exit
end
xtset, clear
bootstrap _b, reps(1000) seed(1234) cluster(pairid): dydx_margins2
This gave me the exact same result which is making me nervous that the sampling is not being done correctly and I am not correctly resampling. Does anyone know where I went wrong? Sorry I am coding this from memory.

Thanks!