I'm trying to implement propensity score matching (PSM) in a situation of clustered data. In brief, an intervention was implemented in socio-economically different study sites. For each of these study sites, I have a (fairly large) pool of individuals from which I would like to select appropriate control individuals using PSM.
The issues I'm facing are touched on in an older entry on here but weren't resolved (https://www.statalist.org/forums/for...-with-psmatch2).
I'm implementing PSM with psmatch2 in a way referred to as the 'within approach' by Bruno Arpino in this presentation and related work: https://www.stata.com/meeting/spain1...n18_Arpino.pdf. Essentially it loops through the study sites to get perfect balance on study site-level characteristics. The (simplified) code looks like this:
Code:
* Obtain propsensity scores logistic treatment x1 x2 x3 predict pscore, pr * Set caliper to 0.25xSD of PS scalar cal = r(sd)*0.25 * Execute PSM per site gen weight = . gen att = . levels site, local(slist) foreach s in `slist' { psmatch2 treatment if site == `s', pscore(pscore) caliper(`=scalar(cal)') out(outcome) replace weight = _weight if site == `s' replace att = r(att) if site == `s' }
Code:
logistic outcome i.treatment [fweight=weight] if !mi(weight), cluster(site)
Code:
margins i.treatment, pwcompare(effect) sum att [fweight=weight] if !mi(weight) & treatment == 1
I am able to obtain good balance between treatment and control groups with this approach - and similarly if I just use study site as a variable when estimating propensity scores instead of doing the 'within approach' - but it seems to me that I can only account for the fact that propensity scores are estimated (say, by using teffect) or account for clustering of the data (in a separate regression with cluster-robust SEs), but not both. Now my questions:
1) Is there a way to adjust SEs for the estimation of propensity scores AND clustering in the data? What are your recommendations in this situation?
2) If there is no solution for this, what do you think the consequences are? It appears to me that, in my approach with the regression, SEs tend to be overestimated compared to teffects. Can one argue that this makes the regression results conservative?
3) Unrelated to this problem, something else I was wondering about for this 'within approach': Do you think one should estimate the propensity scores for each site, i.e. have the estimation within the loop, instead of estimating propensity scores for the sample as a whole before the PSM in each site? It probably doesn't matter much if you include study site as a variable when estimating propensity scores and still do the PSM per site.
I would be very grateful for any insight you can offer! Happy to provide more information on the data or my code but my questions are more about the general approach to this problem.
Thank you!
Robin
0 Response to Propensity score matching (psmatch2) for clustered data
Post a Comment