Hi there,

In my research on the survival of strategies within private equity I'm using an accelerated failure time model and need to determine what distribution fits my data the best.
Below is my output presented. Unfortunately, I do not know how to proceed in determining my optimal distribution. Hope someone can clarify my output.

Code:
stset E_Date, failure(Successful==1) id(Strategy_Number) enter(time P_Date) origin(time P_Date)

                id:  Strategy_Number
     failure event:  Successful == 1
obs. time interval:  (E_Date[_n-1], E_Date]
 enter on or after:  time P_Date
 exit on or before:  failure
    t for analysis:  (time-origin)
            origin:  time P_Date

------------------------------------------------------------------------------
      1,197  total observations
          0  exclusions
------------------------------------------------------------------------------
      1,197  observations remaining, representing
      1,197  subjects
        251  failures in single-failure-per-subject data
  3,031,231  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =     8,216
What is the reason the number of subjects and failures is different than what is presented in the whole sample?

Code:
Weibull AFT regression

No. of subjects =          917                  Number of obs    =         917
No. of failures =          171
Time at risk    =      2162758
                                                LR chi2(26)      =      428.44
Log likelihood  =   -236.11205                  Prob > chi2      =      0.0000
This output above is just from Weibull, the other models show the same numbers, in terms of total subjects and failures.

Since the models are nested, I have to use a likelihood ratio test for Log-Normal, Exponential and Weibull. As shown below.

Model 1 = Gamma
Model 2 = Weibull
Model 3 = Exponential
Model 4 = Log-Normal
Model 5 = Log-Logistic

Code:
. lrtest (Model2)(Model3), force

Likelihood-ratio test                                 LR chi2(2)  =    309.66
(Assumption: Model3 nested in Model2)                 Prob > chi2 =    0.0000

. lrtest (Model1)(Model2), force

Likelihood-ratio test                                 LR chi2(0)  =   -217.14
(Assumption: Model2 nested in Model1)                 Prob > chi2 =         .

. lrtest (Model1)(Model4), force

Likelihood-ratio test                                 LR chi2(0)  =   -232.65
(Assumption: Model4 nested in Model1)                 Prob > chi2 =         .

. lrtest (Model1)(Model3), force

Likelihood-ratio test                                 LR chi2(2)  =     92.52
(Assumption: Model3 nested in Model1)                 Prob > chi2 =    0.0000
For the non-nested models I need to compare the AIC values.

Code:
. estimates stats _all

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
      Model1 |        917         .  -344.6812      28    745.3624   880.3534
      Model2 |        917 -450.3315  -236.1121      28    528.2241   663.2151
      Model3 |        917         .  -390.9409      26    833.8817   959.2305
      Model4 |        917 -439.1325  -228.3551      28    512.7102   647.7012
      Model5 |        917         .  -357.0475      27    768.0951    898.265
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.
Again, the number of observations is different than the stset output indicated.

According to the data above is it correct that I should opt for the log-normal model (model 4)?

I hope someone could explain my outputs, thanks in advance.

Kind regards,

Michael