Dear all,

as part of my PhD I am looking at a sample of about 95,000 firms in Switzerland and I want to find out if having an external auditor (such as PwC, KPMG, EY) reduces the likelihood that a firm goes bankrupt. In Switzerland, smaller companies can choose if they want to have their accounts audited or not, i.e. they can 'opt-out' from having an external auditor.
So far I have looked at this mainly by using logistic regression models with bankruptcy (0/1) being the dependent variable. After discussion with my supervisor, I would like to corroborate my results with survival analysis techniques.

I have come up with the following basic Cox proportional hazard model:
stcox bOptingOut lncapital i.firmCanton i.industryCode, nolog

- bOptingOut is basically my treatment/non-treatment variable (1 = opting-out, i.e. no auditor; 0 = no opting-out, i.e. financial statements are audited)
- lncapital is the natural logarithm of the firm's paid up capital in Swiss Francs (corresponds practically to USD)
- firmCanton is an indicator variable to control for the cantont/state in which the firm is domiciled
- industryCode is an indicator variable to control for industry effects

In principle, I was quite happy with the model as the coefficients are highly significant and the direction of the effects appears reasonable.

Code:
. stcox i.bOptingOut lncapital i.firmCanton i.industryCode, nolog

         failure _d:  event == 3
   analysis time _t:  (date1-origin)
             origin:  event==1
  enter on or after:  event==1 time td(01jan2008)
  exit on or before:  event==3 time td(31dec2018)
                 id:  id

Cox regression -- Breslow method for ties

No. of subjects =       94,319                  Number of obs    =     161,050
No. of failures =        9,683
Time at risk    =    299931461
                                                LR chi2(44)      =     4529.30
Log likelihood  =   -106232.09                  Prob > chi2      =      0.0000

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.bOptingOut |   1.425763   .0464468    10.89   0.000     1.337574    1.519766
   lncapital |   .8598573   .0107152   -12.12   0.000     .8391103    .8811172
Then I turned to to testing the proportional hazards assumption. First, I did some graphical checks and plotted the hazard and survival functions as well as stphplot and stcoxkm. IMHO, things do look quite okay (see picture below).
Array


Afterwards, I performed the test based on Schoenfeld residuals (estat phtest) and get a Prob>chi2 value of 0.0000.
Based on this, the proportional hazards assumption is clearly violated.

Currently, I have two main questions:
a) Does my data set really violate the proportional hazard (PH) assumption? Or is this simply due to the big sample size I have?

b) If the PH assumption is really violated, is it okay to use a lognormal parametric model instead [streg ..., dist(lognormal)]?
Based on AIC/BIC that would be the best of the parametric models. Or would you suggest anything else?

I am using the book "An Introduction to Survival Analysis Using Stata" (Revised 3rd Edition) by Cleves, Gould and Marchenko.

As you can see, this is my first post to the Statalist forum. In case you need further information, just let me know.
I would really appreciate your help!

Kind regards,
Daniel