I'm conducting a difference-in-differences (DID) analysis with a low number of groups (N=6, T=24). Four units received treatment, two are controls. Inference in DID with few clusters is a huge debate, and permutation tests are among the recommended solutions. However, practical how-to implementation could be more accessible (at least for me).

I see some specify a specific number of permutations. Ryan et al. (2015) conduct 49 permutations. While in the Stata documentation it says that you can increase your permutations to improve your inference (StataCorp, 2017, p.1942).

Specifically, in the following I outline my approach, and I would be very happy for some input on whether it seems correct or needs some changes.

I set the test statistic as the treatment effect. In DID this is the interaction of two dummies, one for treatment and one for post-intervention. I permute the dependent variable, specify interaction as test statistic, set units (who is either assigned to control or treatment) as strata and run 1000 times,
Code:
permute Y _b[1.treatment#1.post], ///
strata(unit) reps(1000): xtreg Y i.treatment#i.post
However, I wonder whether I rather should permute the unit with the interaction term as test statistic, and treatment status as strata, and either run some specified numbers of reps or stick to a high number for inference, i.e.
Code:
 permute unit _b[1.treatment#1.post], /// 
strata(treatment) reps(1000): xtreg Y i.treatment#i.post
I worked with the literature for some time now, and I can't find a specific guidance on this setting, so hopefully some of you have some input on this.

References:

Ryan et al. Why We Should Not Be Indifferent to Specification Choices for Difference-in-Differences.Health Serv Res. 2015 Aug;50(4):1211-35. doi: 10.1111/1475-6773.12270.

StataCorp. Stata Base Reference Manual: Release 15.

(Also posted on Stackexchange)