The help file for
mimrgns states that while "[i]n principle,
marginsplot works after
mimrgns […], the plotted confidence intervals are based on inappropriate degrees of freedom". However, it also suggests that "the differences should be small for large sample sizes". In this post, I demonstrate how to calculate these differences.
I borrow an example from
https://www3.nd.edu/~rwilliam/stats3/MD02.pdf”]Richard Williams’ excellent paper on Multiple Imputation & Maximum Likelihood[/url]. We start with an example dataset where we impute missing values in one variable. Then, we run a logistic regression model and obtain predictive margins.*
Code:
version 12.1
webuse mheart0, clear
mi set mlong
mi register imputed bmi
mi register regular attack smokes age hsgrad female
mi impute regress bmi attack smokes age hsgrad female, add(20) rseed(2232)
mi estimate, dots: logit attack i.smokes age bmi i.hsgrad i.female
mimrgns smokes, at(bmi = (20(5)35)) predict(pr) cmdmargins
Omitting most of the outout, the results for the
mimrgns command are
Code:
(output omitted)
. mimrgns smokes, at(bmi = (20(5)35)) predict(pr) cmdmargins
Multiple-imputation estimates Imputations = 20
Predictive margins Number of obs = 154
Average RVI = 0.0419
Largest FMI = 0.1500
DF adjustment: Large sample DF: min = 866.55
avg = 41276.27
Within VCE type: Delta-method max = 191492.08
Expression : Pr(attack), predict(pr)
1._at : bmi = 20
2._at : bmi = 25
3._at : bmi = 30
4._at : bmi = 35
------------------------------------------------------------------------------
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at#smokes |
1 0 | .2132581 .0594192 3.59 0.000 .0967207 .3297954
1 1 | .4716674 .0832583 5.67 0.000 .3084512 .6348835
2 0 | .3239331 .0498932 6.49 0.000 .2261437 .4217226
2 1 | .6118189 .0617438 9.91 0.000 .4908022 .7328357
3 0 | .4585168 .0750399 6.11 0.000 .3113833 .6056503
3 1 | .7356222 .071637 10.27 0.000 .5951187 .8761257
4 0 | .5985056 .1179053 5.08 0.000 .3671509 .8298602
4 1 | .830356 .0798759 10.40 0.000 .673583 .9871289
------------------------------------------------------------------------------
Now, if we were to call
marginsplot, the plotted confidence intervals would not match what
mimrgns reports. How large would the differences be? To find out, we first store
mimrgns' results in a matrix. More precisely, we store the coefficients, standard errors, and confidence limits. Those results are stored in rows 1, 2, 5, and 6 of
r(table). Since we will later store these results as variables, we transpose the matrix.
Code:
matrix rtable = r(table)
matrix rtable = (rtable[1..2, 1...]\ rtable[5..6, 1...])'
Now, that we have stored
mimrgns' results, we will replicate the file that
marginsplot would use. To do so, we need to dig a little deeper into
marginsplot's internals.
marginsplot internally calls
margins with its undocumented
saving() option. Undocumented means that there is an online help file but no corresponding entry in the manual. More importantly, it means that this option might behave differently in the future, even under version control. Anyway,
margins, with the
saving() option, calls a utility routine,
_marg_save, which is not documented. Not documented means that there is neither an online help file nor an entry in the manual. The code is implemented as an ado-file, so we can learn how it works. Obviously, it might work differently in the future.
Code:
_marg_save , saving(mimrgns_results , double)
We can now load the created file and have a look at some of the variables
Code:
use mimrgns_results, clear
list _margin _se _ci_lb _ci_ub , noobs separator(0)
The output is
Code:
. list _margin _se _ci_lb _ci_ub , noobs separator(0)
+-----------------------------------------------+
| _margin _se _ci_lb _ci_ub |
|-----------------------------------------------|
| .21325808 .05941922 .09679856 .3297176 |
| .47166739 .08325828 .30848416 .63485062 |
| .32393311 .04989317 .22614429 .42172193 |
| .61181892 .06174378 .49080334 .7328345 |
| .45851679 .07503986 .31144138 .60559221 |
| .73562219 .07163704 .59521618 .8760282 |
| .59850555 .11790533 .36741535 .82959576 |
| .83035595 .07987593 .67380201 .98690989 |
+-----------------------------------------------+
From a first glance, it appears as if the results match those from
mimrgns. To be sure, we will save the
mimrgns results in the same dataset and calculate the respective differences.
Code:
svmat double rtable , names(col)
local vars _margin b _se se _ci_lb ll _ci_ub ul
forvalues i = 1(2)8 {
local var1 : word `i' of `vars'
local var2 : word `++i' of `vars'
generate double diff_`var2' = abs(`var1'-`var2')
}
Finally, we can look at the differences.
Code:
list _margin b diff_b _se se diff_se _ci_lb ll diff_ll _ci_ub ul diff_ul , noobs separator(0)
which shows
Code:
. list _margin b diff_b _se se diff_se _ci_lb ll diff_ll _ci_ub ul diff_ul , noobs separator(0)
+------------------------------------------------------------------------------------------------------------------------------------------+
| _margin b diff_b _se se diff_se _ci_lb ll diff_ll _ci_ub ul diff_ul |
|------------------------------------------------------------------------------------------------------------------------------------------|
| .21325808 .21325808 0 .05941922 .05941922 0 .09679856 .09672075 .00007781 .3297176 .32979541 .00007781 |
| .47166739 .47166739 0 .08325828 .08325828 0 .30848416 .30845124 .00003292 .63485062 .63488354 .00003292 |
| .32393311 .32393311 0 .04989317 .04989317 0 .22614429 .22614367 6.181e-07 .42172193 .42172255 6.181e-07 |
| .61181892 .61181892 0 .06174378 .06174378 0 .49080334 .49080216 1.180e-06 .7328345 .73283568 1.180e-06 |
| .45851679 .45851679 0 .07503986 .07503986 0 .31144138 .3113833 .00005808 .60559221 .60565029 .00005808 |
| .73562219 .73562219 0 .07163704 .07163704 0 .59521618 .59511871 .00009746 .8760282 .87612566 .00009746 |
| .59850555 .59850555 0 .11790533 .11790533 0 .36741535 .36715088 .00026447 .82959576 .82986022 .00026447 |
| .83035595 .83035595 0 .07987593 .07987593 0 .67380201 .67358305 .00021897 .98690989 .98712886 .00021897 |
+------------------------------------------------------------------------------------------------------------------------------------------+
We find that the point estimates and the standard errors match exactly; the difference is flat 0. As stated in the help file, the confidence intervals do not match exactly. In this example, we find that the differences are in the 4th decimal place or later. We will not be able to spot those differences in a graph.
Note that there is no guarantee that the differences will always be that small. However, perhaps the waring in the help file overstates the problem. Anyway, you now know how to check when in doubt.
Best
Daniel
* I have modified the
mimrgns call to include imputed variables in the
at() option. The observed differences for confidence intervals are even smaller (virtually 0) in the original example.