To illustrate what I mean, below is an example of individual data which I group, then I run a model on the individual and the grouped data, and finally I compare the predictive margins for the two models. In this example, the group data's margins and their standard errors are larger than the original ungrouped data's by a constant factor of .03338.
Code:
webuse nhanes2f, clear keep if !missing(diabetes, female, black, age, age2) // Create denominator variable to set up grouped data egen ycovpatt = group(diabetes female black age age2) egen d = count(ycovpatt), by(ycovpatt) // Create outcome variable for grouped data gen diabetesg = diabetes * d // Identify duplicate outcome-covariate patterns (so group data is dup == 1) bysort ycovpatt: gen dup = _n // Run model with full (ie individual) data and save results and predicted margins qui glm diabetes i.female i.black c.age c.age2, family(binomial) link(logit) eststo i mat lli = `e(ll)' qui margins female, at(age=(20 40 60)) mat mtabi = r(table) // Repeat with grouped data qui glm diabetesg i.female i.black c.age c.age2 if dup == 1, family(binomial d) link(logit) eststo g mat llg = `e(ll)' qui margins female, at(age=(20 40 60)) mat mtabg = r(table)
Code:
. // Compare results and statistics . esttab i g -------------------------------------------- (1) (2) diabetes diabetesg -------------------------------------------- main 0.female 0 0 (.) (.) 1.female 0.157 0.157 (1.66) (1.66) 0.black 0 0 (.) (.) 1.black 0.721*** 0.721*** (5.69) (5.69) age 0.132*** 0.132*** (4.55) (4.55) age2 -0.000703* -0.000703* (-2.55) (-2.55) _cons -8.150*** -8.150*** (-10.93) (-10.93) -------------------------------------------- N 10335 345 -------------------------------------------- t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001 . ** The coefficients and standard errors are the same . mat li lli symmetric lli[1,1] c1 r1 -1808.5522 . mat li llg symmetric llg[1,1] c1 r1 -1808.5522 . ** The log likelihoods are the same . mat li mtabi mtabi[9,6] 1._at# 1._at# 2._at# 2._at# 3._at# 3._at# 0.female 1.female 0.female 1.female 0.female 1.female b .00129158 .00151003 .01769071 .02057441 .17840235 .19903016 se .00063654 .00074138 .00260439 .00293701 .07697392 .08082237 z 2.0290627 2.0367778 6.7926462 7.0052315 2.3176986 2.4625627 pvalue .0424519 .04167232 1.101e-11 2.466e-12 .02046571 .0137948 ll .00004398 .00005695 .0125862 .01481798 .02753624 .04062122 ul .00253918 .00296311 .02279522 .02633084 .32926845 .3574391 df . . . . . . crit 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964 eform 0 0 0 0 0 0 . mat li mtabg mtabg[9,6] 1._at# 1._at# 2._at# 2._at# 3._at# 3._at# 0.female 1.female 0.female 1.female 0.female 1.female b .03869134 .04523528 .5299521 .6163378 5.3443137 5.9622514 se .01906858 .02220923 .07801851 .0879825 2.3058709 2.4211572 z 2.0290627 2.0367778 6.7926462 7.0052315 2.3176986 2.4625627 pvalue .0424519 .04167232 1.101e-11 2.466e-12 .02046571 .0137948 ll .00131761 .00170598 .37703864 .44389526 .82488989 1.2168706 ul .07606507 .08876458 .68286556 .78878034 9.8637376 10.707632 df . . . . . . crit 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964 eform 0 0 0 0 0 0 . ** The predictive margins are different but the z values are the same . . // Divide the elements of the individual data's margins results matrix by those of the group data's . mata: A = st_matrix("mtabi") . mata: B = st_matrix("mtabg") . mata: A:/B 1 2 3 4 5 6 +-------------------------------------------------------------------------------------+ 1 | .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 | 2 | .0333817126 .0333817126 .0333817127 .0333817126 .0333817126 .0333817126 | 3 | 1 1 .9999999992 1 .9999999997 .9999999997 | 4 | .9999999996 .9999999998 1.000000036 .9999999967 1.000000002 1.000000002 | 5 | .0333817127 .0333817127 .0333817126 .0333817126 .0333817126 .0333817126 | 6 | .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 .0333817126 | 7 | . . . . . . | 8 | 1 1 1 1 1 1 | 9 | . . . . . . | +-------------------------------------------------------------------------------------+ . ** As can be seen in rows 1 and 2, the difference between the models' predictive margins and their standard errors is a factor of .03338
As an aside, note that following Clyde Schechter's comment on this list from Sept. 20, 2019, specifying the -expression()- option as -expression(predict(mu)/d)- when running -margins- with the grouped data provides predictive margins and delta method standard errors that are very close to those obtained with individual data.
0 Response to GLM and predicted margins for grouped data: "rescaling" to individual data?
Post a Comment