I am using Paul Lambert's stpm2 to regress relative survival on a set of covariates.
Before running stpm2, I looked for the model with the better fit, which turned out to be the PO1, equally considering AIC and BIC
Scale = hazards
df = 1, AIC = 1718.84 BIC = 1732.86
df = 2, AIC = 1715.60 BIC = 1733.13
df = 3, AIC = 1716.78 BIC = 1737.81
df = 4, AIC = 1715.98 BIC = 1740.52
df = 5, AIC = 1717.58 BIC = 1745.62
Scale = odds
df = 1, AIC = 1716.48 BIC = 1730.50
df = 2, AIC = 1715.82 BIC = 1733.34
df = 3, AIC = 1716.54 BIC = 1737.57
df = 4, AIC = 1716.06 BIC = 1740.59
df = 5, AIC = 1717.70 BIC = 1745.74
Scale = normal
df = 1, AIC = 1783.05 BIC = 1797.07
df = 2, AIC = 1784.63 BIC = 1802.16
df = 3, AIC = 1785.71 BIC = 1806.74
df = 4, AIC = 1785.55 BIC = 1810.09
df = 5, AIC = 1787.16 BIC = 1815.21
As a result of the multiple regression, I get extremely high values for exp(b) and for CIs of the variable comCV, which is a 0/1 indicator of the presence of cardiovascular comorbidities:
Code:
. stpm2 females age ib2.BMI_class1 CKD_EPIv2 nph1 nph2 nph3 nph7 com_CV diabete2 hb2 K2 slope_GFR dur_
> dial, ///
> bhazard(rate) df(1) scale(odds) nolog eform baselevels
Log likelihood = -742.92797 Number of obs = 835
------------------------------------------------------------------------------
| exp(b) Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb |
females | .8044153 .2003005 -0.87 0.382 .4937759 1.310481
age | 1.03572 .0268045 1.36 0.175 .9844942 1.089611
|
BMI_class1 |
<24.99 | 1.461836 .5295387 1.05 0.295 .7187164 2.973307
25-29.99 | .8533482 .2756105 -0.49 0.623 .4531157 1.607102
30-34.99 | 1 (base)
≥35 | 1.492883 .5579428 1.07 0.284 .7176324 3.10563
|
CKD_EPIv2 | 1.04822 .0165166 2.99 0.003 1.016343 1.081097
nph1 | 3.097226 1.63313 2.14 0.032 1.101907 8.705649
nph2 | 1.681237 1.414288 0.62 0.537 .3232824 8.743305
nph3 | 1.612484 .8053305 0.96 0.339 .6058649 4.291558
nph7 | 3.399568 1.993952 2.09 0.037 1.076881 10.73198
com_CV | 59.39088 146.0529 1.66 0.097 .4791397 7361.688
diabete2 | 2.201047 .6498246 2.67 0.008 1.234029 3.925845
hb2 | .8611074 .0779225 -1.65 0.098 .7211593 1.028214
K2 | .8274406 .1500999 -1.04 0.296 .5798647 1.18072
slope_GFR | .6025818 .0285791 -10.68 0.000 .5490924 .6612817
dur_dial | 1.000125 .0002395 0.52 0.601 .999656 1.000595
_rcs1 | 7.125679 .9170236 15.26 0.000 5.537104 9.170009
_cons | .0001439 .0005008 -2.54 0.011 1.56e-07 .1322831
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
.
I tried running the multiple regression switching to PH1 and PH2 models and still get very high coefficients. Instead if I use the normal scale, results are much more acceptable
Code:
. stpm2 females age ib2.BMI_class1 CKD_EPIv2 nph1 nph2 nph3 nph7 com_CV diabete2 hb2 K2 slope_GFR dur_
> dial, ///
> bhazard(rate) df(1) scale(normal) nolog eform baselevels
Log likelihood = -810.10208 Number of obs = 835
------------------------------------------------------------------------------
| exp(b) Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb |
females | .9484795 .1314391 -0.38 0.703 .7228854 1.244476
age | 1.000844 .0145023 0.06 0.954 .9728194 1.029675
|
BMI_class1 |
<24.99 | 1.129578 .2418444 0.57 0.569 .7424597 1.718541
25-29.99 | .9432495 .1671652 -0.33 0.742 .6664604 1.334992
30-34.99 | 1 (base)
≥35 | 1.072004 .2201216 0.34 0.735 .7168262 1.603169
|
CKD_EPIv2 | 1.023801 .0087958 2.74 0.006 1.006706 1.041186
nph1 | 2.137752 .6897906 2.35 0.019 1.135793 4.023605
nph2 | 1.694423 .7632828 1.17 0.242 .7007825 4.09695
nph3 | 1.514186 .467945 1.34 0.179 .8262725 2.774821
nph7 | 2.124199 .7612974 2.10 0.036 1.052279 4.288048
com_CV | 4.6894 3.607231 2.01 0.045 1.038358 21.17813
diabete2 | 1.327321 .2200226 1.71 0.088 .9591289 1.836855
hb2 | .9517406 .0484481 -0.97 0.331 .8613673 1.051596
K2 | .8995876 .092316 -1.03 0.302 .7356867 1.100004
slope_GFR | .792338 .0213394 -8.64 0.000 .7515983 .8352859
dur_dial | 1.000105 .0001313 0.80 0.423 .9998479 1.000363
_rcs1 | 2.742701 .2441501 11.33 0.000 2.303595 3.265508
_cons | .027567 .0484567 -2.04 0.041 .0008794 .8642
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
com_CV is certainly highly associated with relative survival, but PO and PH estimates are clearly unrealistic. Maybe the problem lies in the unbalanced distribution of comorbidities and deaths, as you see below. Given these results, I am inclined to use the regression in the normal scale, although it is far by being the fittest. Can anyone recommend a better remedy?
Code:
. tab died com_CV, row chi
| (max) com_CV
Died | 0 1 | Total
-----------+----------------------+----------
0 | 207 383 | 590
| 35.08 64.92 | 100.00
-----------+----------------------+----------
1 | 17 229 | 246
| 6.91 93.09 | 100.00
-----------+----------------------+----------
Total | 224 612 | 836
| 26.79 73.21 | 100.00
thank you very much for your attention
0 Response to bloated estimates and CIs in regression of relative survival
Post a Comment