To illustrate what I mean, below is an example of individual data which I group, then I run a model on the individual and the grouped data, and finally I compare the predictive margins for the two models. In this example, the group data's margins and their standard errors are larger than the original ungrouped data's by a constant factor of .03338.
Code:
webuse nhanes2f, clear keep if !missing(diabetes, female, black, age, age2) // Create denominator variable to set up grouped data egen ycovpatt = group(diabetes female black age age2) egen d = count(ycovpatt), by(ycovpatt) // Create outcome variable for grouped data gen diabetesg = diabetes * d // Identify duplicate outcome-covariate patterns (so group data is dup == 1) bysort ycovpatt: gen dup = _n // Run model with full (ie individual) data and save results and predicted margins qui glm diabetes i.female i.black c.age c.age2, family(binomial) link(logit) eststo i mat lli = `e(ll)' qui margins female, at(age=(20 40 60)) mat mtabi = r(table) // Repeat with grouped data qui glm diabetesg i.female i.black c.age c.age2 if dup == 1, family(binomial d) link(logit) eststo g mat llg = `e(ll)' qui margins female, at(age=(20 40 60)) mat mtabg = r(table)
Code:
. // Compare results and statistics
. esttab i g
--------------------------------------------
(1) (2)
diabetes diabetesg
--------------------------------------------
main
0.female 0 0
(.) (.)
1.female 0.157 0.157
(1.66) (1.66)
0.black 0 0
(.) (.)
1.black 0.721*** 0.721***
(5.69) (5.69)
age 0.132*** 0.132***
(4.55) (4.55)
age2 -0.000703* -0.000703*
(-2.55) (-2.55)
_cons -8.150*** -8.150***
(-10.93) (-10.93)
--------------------------------------------
N 10335 345
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
. ** The coefficients and standard errors are the same
. mat li lli
symmetric lli[1,1]
c1
r1 -1808.5522
. mat li llg
symmetric llg[1,1]
c1
r1 -1808.5522
. ** The log likelihoods are the same
. mat li mtabi
mtabi[9,6]
1._at# 1._at# 2._at# 2._at# 3._at# 3._at#
0.female 1.female 0.female 1.female 0.female 1.female
b .00129158 .00151003 .01769071 .02057441 .17840235 .19903016
se .00063654 .00074138 .00260439 .00293701 .07697392 .08082237
z 2.0290627 2.0367778 6.7926462 7.0052315 2.3176986 2.4625627
pvalue .0424519 .04167232 1.101e-11 2.466e-12 .02046571 .0137948
ll .00004398 .00005695 .0125862 .01481798 .02753624 .04062122
ul .00253918 .00296311 .02279522 .02633084 .32926845 .3574391
df . . . . . .
crit 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964
eform 0 0 0 0 0 0
. mat li mtabg
mtabg[9,6]
1._at# 1._at# 2._at# 2._at# 3._at# 3._at#
0.female 1.female 0.female 1.female 0.female 1.female
b .03869134 .04523528 .5299521 .6163378 5.3443137 5.9622514
se .01906858 .02220923 .07801851 .0879825 2.3058709 2.4211572
z 2.0290627 2.0367778 6.7926462 7.0052315 2.3176986 2.4625627
pvalue .0424519 .04167232 1.101e-11 2.466e-12 .02046571 .0137948
ll .00131761 .00170598 .37703864 .44389526 .82488989 1.2168706
ul .07606507 .08876458 .68286556 .78878034 9.8637376 10.707632
df . . . . . .
crit 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964
eform 0 0 0 0 0 0
. ** The predictive margins are different but the z values are the same
.
. // Divide the elements of the individual data's margins results matrix by those of the group data's
. mata: A = st_matrix("mtabi")
. mata: B = st_matrix("mtabg")
. mata: A:/B
1 2 3 4 5 6
+-------------------------------------------------------------------------------------+
1 | .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 |
2 | .0333817126 .0333817126 .0333817127 .0333817126 .0333817126 .0333817126 |
3 | 1 1 .9999999992 1 .9999999997 .9999999997 |
4 | .9999999996 .9999999998 1.000000036 .9999999967 1.000000002 1.000000002 |
5 | .0333817127 .0333817127 .0333817126 .0333817126 .0333817126 .0333817126 |
6 | .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 |
7 | . . . . . . |
8 | 1 1 1 1 1 1 |
9 | . . . . . . |
+-------------------------------------------------------------------------------------+
. ** As can be seen in rows 1 and 2, the difference between the models' predictive margins and their standard errors is a factor of .03338As an aside, note that following Clyde Schechter's comment on this list from Sept. 20, 2019, specifying the -expression()- option as -expression(predict(mu)/d)- when running -margins- with the grouped data provides predictive margins and delta method standard errors that are very close to those obtained with individual data.
0 Response to GLM and predicted margins for grouped data: "rescaling" to individual data?
Post a Comment