Hi everyone,

I am a student, using STATA in my project.

This is my first study using survival analysis and Cox regression.

There are some problems in my study. I tried to find the answer in many sources, but impossible.

Could you please help me?

Thank you.

1. I used the survival analysis with attained age. I run the stcox, and then estat phtest, detail. Most of covariates in the Cox regression models violate the PH assumption. Then I used stpm2, added all covariates that violated the PH assumption into the tvc. Is it proper or not when adding all violated variables into the tvc? Or, only adding the interested exposures is enough?


Code:
. * 2.SURVIAL since ATTAINED AGE at diagnosis*

.
. stset exit_date, fail(Death==1) id(id) enter(indexdate) origin(birthdatescb) scale(365.24)

                id:  id
     failure event:  Death == 1
obs. time interval:  (exit_date[_n-1], exit_date]
 enter on or after:  time indexdate
 exit on or before:  failure
    t for analysis:  (time-origin)/365.24
            origin:  time birthdatescb

------------------------------------------------------------------------------
    265,173  total observations
     34,043  observations end on or before enter()
------------------------------------------------------------------------------
    231,130  observations remaining, representing
    231,130  subjects
    157,782  failures in single-failure-per-subject data
 733,687.56  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =   45.0115
                                          last observed exit t =  108.4438
Code:
. stcox Sve i.education_merge i.income_cat sex i.kommun_types live_alone cci

         failure _d:  Death == 1
   analysis time _t:  (exit_date-origin)/365.24
             origin:  time birthdatescb
  enter on or after:  time indexdate
                 id:  id

Iteration 0:   log likelihood = -1529196.3
Iteration 1:   log likelihood = -1517040.7
Iteration 2:   log likelihood = -1516187.9
Iteration 3:   log likelihood = -1516181.5
Iteration 4:   log likelihood = -1516181.5
Refining estimates:
Iteration 0:   log likelihood = -1516181.5

Cox regression -- Breslow method for ties

No. of subjects =      224,116                  Number of obs    =     224,116
No. of failures =      152,876
Time at risk    =  711694.7076
                                                LR chi2(10)      =    26029.69
Log likelihood  =   -1516181.5                  Prob > chi2      =      0.0000

-----------------------------------------------------------------------------------
               _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
           Sve |   .5996859   .0035806   -85.64   0.000      .592709    .6067449
                  |
  education_merge |
Upper secondary   |   .9361713   .0053124   -11.62   0.000     .9258169    .9466414
      University  |   .8898578   .0097304   -10.67   0.000     .8709894    .9091349
                  |
       income_cat |
  Middle tertile  |   .9723618   .0060747    -4.49   0.000     .9605283    .9843412
 Highest tertile  |   .9077582    .006752   -13.01   0.000     .8946205    .9210887
                  |
              sex |   .7143673   .0042241   -56.88   0.000      .706136    .7226946
                  |
     kommun_types |
    Intermediate  |    1.00545   .0064891     0.84   0.400     .9928117    1.018249
           Rural  |   .9984355   .0066947    -0.23   0.815     .9854001    1.011643
                  |
       live_alone |   1.094858    .006443    15.40   0.000     1.082302    1.107559
              cci |   1.148494   .0013698   116.09   0.000     1.145812    1.151181
-----------------------------------------------------------------------------------

.
. estat phtest, detail

      Test of proportional-hazards assumption

      Time:  Time
      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      Sve         |      0.00967        14.20        1         0.0002
      1b.educati~e|            .            .        1             .
      2.educatio~e|      0.00206         0.65        1         0.4210
      3.educatio~e|      0.00280         1.19        1         0.2747
      1b.income_~t|            .            .        1             .
      2.income_cat|      0.00677         7.00        1         0.0082
      3.income_cat|      0.02074        65.91        1         0.0000
      sex         |      0.00740         8.44        1         0.0037
      1b.kommun_~s|            .            .        1             .
      2.kommun_t~s|      0.01065        17.29        1         0.0000
      3.kommun_t~s|      0.01770        47.72        1         0.0000
      live_alone  |     -0.02901       130.41        1         0.0000
      cci         |     -0.04345       263.37        1         0.0000
      ------------+---------------------------------------------------
      global test |                    542.40       10         0.0000
      ----------------------------------------------------------------


2. When I added the all of the violated covariates into the tvc of stpm2, STATA keeps showing this memo " note: delayed entry models are being fitted. Iteration 0: log likelihood = 321425.8 (not concave) ...". It takes hours for STATA to show the result. Could you please explain to me what happens?

Code:
. stpm2 Sve education_merge2 education_merge3 income_cat2 income_cat3 kommun_types2 kommun_types3 sex live_alone cci, scale(hazard) df(4)
>  tvc(S income_cat2 income_cat3 kommun_types2 kommun_types3 sex live_alone cci) dftvc(3) eform
note: delayed entry models are being fitted

Iteration 0:   log likelihood =  328328.54  (not concave)
Iteration 1:   log likelihood =  328413.17  (not concave)
Iteration 2:   log likelihood =  328576.82  (not concave)
Iteration 3:   log likelihood =  328665.52  (not concave)
Iteration 4:   log likelihood =  328688.49  (not concave)
Iteration 5:   log likelihood =  328710.02  (not concave)
Iteration 6:   log likelihood =  328720.01  (not concave)
Iteration 7:   log likelihood =  328729.98  (not concave)
Iteration 8:   log likelihood =  328735.29  (not concave)
Iteration 9:   log likelihood =  328760.66  (not concave)
Iteration 10:  log likelihood =  328771.01  (not concave)
Iteration 11:  log likelihood =   328780.6  (not concave)
Iteration 12:  log likelihood =  328789.31  (not concave)
Iteration 13:  log likelihood =  328807.42  (not concave)
Iteration 14:  log likelihood =  328818.96  (not concave)
Iteration 15:  log likelihood =  328825.58  (not concave)
Iteration 16:  log likelihood =  328832.69  (not concave)
Iteration 17:  log likelihood =  328842.97  (not concave)
Iteration 18:  log likelihood =  328847.31  (not concave)
Iteration 19:  log likelihood =  328850.81  (not concave)
Iteration 20:  log likelihood =  328852.44  (not concave)
Iteration 21:  log likelihood =   328854.3  (not concave)
Iteration 22:  log likelihood =  328855.65  (not concave)

3. If I don't use stpm2, then I use the tvc options of stcox. It takes many hours for STATA to finish the command. I am using STATA 15 MP2, with very strong computer. Do you have the similar waiting time when running stcox with tvc option?

4. I set the survival analysis with attained age. I don't need to adjust for age in my Cox regression, do I? How about if I set the survival analysis with time since diagnosis date?

5. I plotted a Kaplan Meier curve, with the attained age. I tried to adjust the x-axis scale between 45 and 115. But the x-axis in the graph stills shows scale between 0 and 115. Could you please explain to me? How can I make the scale between 45 and 115?

Code:
sts graph, by(patient) title ("Survival since birth-attained age between Sve and non-Sve patients",size(3.5) style(heading)) ylabel(0.2(0.2)1,
labsize(small)) xscale(range(45 115)) xlabel(45(10)115, labsize(small)) xtitle("Attained age", size(3)) ytitle("Survial rate",size(3))
legend(order(1 "Sve" 2 "non-Sve") ring(0) position(7) rows(2) size(small)) caption("Log rank test p<0.001",size(small)) name (d,replace)
Array


I'm sorry for asking a lot of questions. But I am at a beginner level.
Thank you.