Hi all,

I am working with 2 appended datasets of observational data set at different timepoints.
Dataset 0 is set in region 0 with approx 5000 participants - baseline region
Dataset 1 is set in region 1 with approx 2000 participants - comparator region
Outcome is binary 0/1
Factors are all binary 0/1 or categorical

I am trying to standardize the prevalence of a binary outcome in dataset 1 to be able to compare it to dataset 0 without the effect of covariates involved. To do this, I am trying to predict a standardized outcome for region 1 (the comparator), taking away changes that have occurred in the factors between the regions by basing these on the baseline region 0.

For this example I am using the example dataset margex and assuming the binary "treatment" variable is equivalent to my regions: (I am new to these examples and couldn't find another one with 2 groups and multiple covariates)

use http://www.stata-press.com/data/r15/margex.dta

logistic outcome i.treatment i.arm i.yc i.agegroup
preserve
keep if treatment == 0
margins, noesample at(treatment=(1))
restore

Alternatively, I have the code:
logistic outcome i.treatment i.arm i.yc i.agegroup
margins if treatment== 0, noesample at(treatment=(1)
I'm not sure how this differs from the above margins, but it gives me the same results in less lines of code.

I am not sure:
  • if this code is the correct way to standardize. I find it counter-intuitive and thought it should be:
    • margins if treatment== 1, noesample at(treatment=(0) //with the 0/1 switched.
    • What does this command line mean when the 0/1 are switched?
  • if it is, how do I interpret the output of this code. Is this output the expected prevalence of treatment 0 or treatment 1? Because the logisitic model is run on the whole dataset so I feel like the estimators then used in margins are of the whole dataset, not the baseline of treatment 0.
Also, if I want to add in an interaction term, do I have to specify the term in the "at()" function separately, or is it enough to have it in the regression as so:
logistic outcome i.treatment##i.arm i.yc i.agegroup
preserve
keep if treatment == 0
margins, noesample at(treatment=(1))
restore

Essentially, I am in graduate school and have been told to run these commands without enough explanation, but after spending weeks scouring the internet, the margins code book, and Stata list, have not been able to find an answer about why this command should be run the way I have described above nor how to interpret the output. Other people talk about using dydx()...

Thank you in advance for your time.

Kind regards,
Victoria