I have a conceptual question. I have a standard difference in differences (DiD) model

Code:
 Y_{i,t} = \beta_0 + \beta_1 treat_i + \beta_2 time_t + \beta_3 (treat_i \cdot time_t) + \varepsilon_{i,t}
where $treat_i$ is a dummy variable for the group membership (treatment vs. control group), and $time_t$ is a dummy variable for the period (before vs. after). Individuals (in my case: firms) and time are indexed by $i$ and $t$, respectively.

The parameter of interest is the DiD estimator $\beta_3$.

How do I estimate $\beta_3$ for subgroups of firms determined by time-invariant characteristics? Example: I want to determine $\beta_3$ for both small and large firms separately, and see if (and by how much) $\beta_3$ differs across those subgroups. Notably, both small and large firms are affected by the treatment $treat_i$, which is why a triple DDD seems not suitable.

How can I do this without subsetting the data and estimating the DiD model for small and large firms, separately?

How can I extend this to subgroup characteristics that are not binary (small vs. large firms), but multinomial (e.g. regions of firms)?