I am using Paul Lambert's stpm2 to regress relative survival on a set of covariates.
Before running stpm2, I looked for the model with the better fit, which turned out to be the PO1, equally considering AIC and BIC
Scale = hazards
df = 1, AIC = 1718.84 BIC = 1732.86
df = 2, AIC = 1715.60 BIC = 1733.13
df = 3, AIC = 1716.78 BIC = 1737.81
df = 4, AIC = 1715.98 BIC = 1740.52
df = 5, AIC = 1717.58 BIC = 1745.62
Scale = odds
df = 1, AIC = 1716.48 BIC = 1730.50
df = 2, AIC = 1715.82 BIC = 1733.34
df = 3, AIC = 1716.54 BIC = 1737.57
df = 4, AIC = 1716.06 BIC = 1740.59
df = 5, AIC = 1717.70 BIC = 1745.74
Scale = normal
df = 1, AIC = 1783.05 BIC = 1797.07
df = 2, AIC = 1784.63 BIC = 1802.16
df = 3, AIC = 1785.71 BIC = 1806.74
df = 4, AIC = 1785.55 BIC = 1810.09
df = 5, AIC = 1787.16 BIC = 1815.21
As a result of the multiple regression, I get extremely high values for exp(b) and for CIs of the variable comCV, which is a 0/1 indicator of the presence of cardiovascular comorbidities:
Code:
. stpm2 females age ib2.BMI_class1 CKD_EPIv2 nph1 nph2 nph3 nph7 com_CV diabete2 hb2 K2 slope_GFR dur_ > dial, /// > bhazard(rate) df(1) scale(odds) nolog eform baselevels Log likelihood = -742.92797 Number of obs = 835 ------------------------------------------------------------------------------ | exp(b) Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- xb | females | .8044153 .2003005 -0.87 0.382 .4937759 1.310481 age | 1.03572 .0268045 1.36 0.175 .9844942 1.089611 | BMI_class1 | <24.99 | 1.461836 .5295387 1.05 0.295 .7187164 2.973307 25-29.99 | .8533482 .2756105 -0.49 0.623 .4531157 1.607102 30-34.99 | 1 (base) ≥35 | 1.492883 .5579428 1.07 0.284 .7176324 3.10563 | CKD_EPIv2 | 1.04822 .0165166 2.99 0.003 1.016343 1.081097 nph1 | 3.097226 1.63313 2.14 0.032 1.101907 8.705649 nph2 | 1.681237 1.414288 0.62 0.537 .3232824 8.743305 nph3 | 1.612484 .8053305 0.96 0.339 .6058649 4.291558 nph7 | 3.399568 1.993952 2.09 0.037 1.076881 10.73198 com_CV | 59.39088 146.0529 1.66 0.097 .4791397 7361.688 diabete2 | 2.201047 .6498246 2.67 0.008 1.234029 3.925845 hb2 | .8611074 .0779225 -1.65 0.098 .7211593 1.028214 K2 | .8274406 .1500999 -1.04 0.296 .5798647 1.18072 slope_GFR | .6025818 .0285791 -10.68 0.000 .5490924 .6612817 dur_dial | 1.000125 .0002395 0.52 0.601 .999656 1.000595 _rcs1 | 7.125679 .9170236 15.26 0.000 5.537104 9.170009 _cons | .0001439 .0005008 -2.54 0.011 1.56e-07 .1322831 ------------------------------------------------------------------------------ Note: Estimates are transformed only in the first equation. .
I tried running the multiple regression switching to PH1 and PH2 models and still get very high coefficients. Instead if I use the normal scale, results are much more acceptable
Code:
. stpm2 females age ib2.BMI_class1 CKD_EPIv2 nph1 nph2 nph3 nph7 com_CV diabete2 hb2 K2 slope_GFR dur_ > dial, /// > bhazard(rate) df(1) scale(normal) nolog eform baselevels Log likelihood = -810.10208 Number of obs = 835 ------------------------------------------------------------------------------ | exp(b) Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- xb | females | .9484795 .1314391 -0.38 0.703 .7228854 1.244476 age | 1.000844 .0145023 0.06 0.954 .9728194 1.029675 | BMI_class1 | <24.99 | 1.129578 .2418444 0.57 0.569 .7424597 1.718541 25-29.99 | .9432495 .1671652 -0.33 0.742 .6664604 1.334992 30-34.99 | 1 (base) ≥35 | 1.072004 .2201216 0.34 0.735 .7168262 1.603169 | CKD_EPIv2 | 1.023801 .0087958 2.74 0.006 1.006706 1.041186 nph1 | 2.137752 .6897906 2.35 0.019 1.135793 4.023605 nph2 | 1.694423 .7632828 1.17 0.242 .7007825 4.09695 nph3 | 1.514186 .467945 1.34 0.179 .8262725 2.774821 nph7 | 2.124199 .7612974 2.10 0.036 1.052279 4.288048 com_CV | 4.6894 3.607231 2.01 0.045 1.038358 21.17813 diabete2 | 1.327321 .2200226 1.71 0.088 .9591289 1.836855 hb2 | .9517406 .0484481 -0.97 0.331 .8613673 1.051596 K2 | .8995876 .092316 -1.03 0.302 .7356867 1.100004 slope_GFR | .792338 .0213394 -8.64 0.000 .7515983 .8352859 dur_dial | 1.000105 .0001313 0.80 0.423 .9998479 1.000363 _rcs1 | 2.742701 .2441501 11.33 0.000 2.303595 3.265508 _cons | .027567 .0484567 -2.04 0.041 .0008794 .8642 ------------------------------------------------------------------------------ Note: Estimates are transformed only in the first equation.
com_CV is certainly highly associated with relative survival, but PO and PH estimates are clearly unrealistic. Maybe the problem lies in the unbalanced distribution of comorbidities and deaths, as you see below. Given these results, I am inclined to use the regression in the normal scale, although it is far by being the fittest. Can anyone recommend a better remedy?
Code:
. tab died com_CV, row chi | (max) com_CV Died | 0 1 | Total -----------+----------------------+---------- 0 | 207 383 | 590 | 35.08 64.92 | 100.00 -----------+----------------------+---------- 1 | 17 229 | 246 | 6.91 93.09 | 100.00 -----------+----------------------+---------- Total | 224 612 | 836 | 26.79 73.21 | 100.00
thank you very much for your attention
0 Response to bloated estimates and CIs in regression of relative survival
Post a Comment