Hi Statalisters,

I am a novice user in Stata. I'm working with Stata.14 and Windows 7.

I'm working on a Panel Data Set for all commerical banks in the U.S. for the period 1995 - 2018 (time variable). So I have data on a bank-year level. I created the ID Variable with the variables bank name and cert I already calculated four bank risk proxies: Z-Score, NPA (non-performing assets), LLP (loan loss provisions) and LLR (loan loss reserves) on a bank-year level.
I calculated the Risk Proxy Z_score and I would like to run the binary probability model explaining the occurrence of a bank failure ( Failure = 1, Active = 0) with the risk proxy (lagged by one year). My "Guiding Paper" reports marginal effects of probit regressions.

This is the dataset with 172 431 observations and the Variable "status" if a banke failed = 1 or if it is still active = 0

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id status Z_score)
1995   7 1 -1.4038005
1995  10 0  -1.434213
1995  11 0 -1.5771302
1995  14 0  -1.758422
1995  16 0 -1.4295077
1995  21 0  -1.329172
1995  27 0 -1.3730284
1995  32 0 -1.3627455
1995  34 0  -1.908463
1995  38 0 -1.8048723
1995  41 0 -1.5905398
1995  46 0  -1.533159
1995  47 0  -1.663955
1995  48 0  -1.518417
1995  52 0 -1.3701818
1995  53 0  -1.485621
1995  56 0   -1.63249
1995  59 0  -1.476241
1995  76 0 -1.3577138
1995  82 0 -1.3661845
1995  84 0 -1.3885205
1995  85 0 -1.5949416
1995  87 0 -2.0597448
1995  99 0 -2.2821965
1995 101 0 -1.5937258
1995 104 0 -1.6237373
end
Now I have to figure out the correct probability model:

- As I use data for all U.S. commercial banks between 1995 and 2018 I have Panel Data, so I have to use xtprobit
- As my model is Pr(Failure = 1) = F(it-1 ß) and my "Guiding Paper" is saying "we report marginal effects of probit regressions" I think I have to use the Population-averaged (PA) model instead of the Random-effects (RE) model. After the literature review I came to the conclusion, that I would like to have a model which should explain the occurence of a bank failure of the entire "banking population" and not of a specific bank (e.g. Deutsche Bank).

So I was running the command with the correct correction of the standard error with vce(robust):

Code:
xtset id year
       panel variable:  id (unbalanced)
        time variable:  year, 1995 to 2018, but with gaps
                delta:  1 unit
Code:
xtprobit status L.Z_score, pa  vce(robust)
And the Output:
Code:
Iteration 1: tolerance = .86103329
Iteration 2: tolerance = .503125
Iteration 3: tolerance = .29113738
Iteration 4: tolerance = .22620098
Iteration 5: tolerance = .14532103
Iteration 6: tolerance = .10679244
Iteration 7: tolerance = .06929776
Iteration 8: tolerance = .05079473
Iteration 9: tolerance = .03304497
Iteration 10: tolerance = .02339874
Iteration 11: tolerance = .01565962
Iteration 12: tolerance = .01090075
Iteration 13: tolerance = .0073888
Iteration 14: tolerance = .00510128
Iteration 15: tolerance = .00347811
Iteration 16: tolerance = .00239194
Iteration 17: tolerance = .00163531
Iteration 18: tolerance = .00112255
Iteration 19: tolerance = .00076844
Iteration 20: tolerance = .00052703
Iteration 21: tolerance = .000361
Iteration 22: tolerance = .00024749
Iteration 23: tolerance = .00016957
Iteration 24: tolerance = .00011623
Iteration 25: tolerance = .00007964
Iteration 26: tolerance = .00005459
Iteration 27: tolerance = .00003741
Iteration 28: tolerance = .00002564
Iteration 29: tolerance = .00001757
Iteration 30: tolerance = .00001204
Iteration 31: tolerance = 8.251e-06
Iteration 32: tolerance = 5.655e-06
Iteration 33: tolerance = 3.875e-06
Iteration 34: tolerance = 2.656e-06
Iteration 35: tolerance = 1.820e-06
Iteration 36: tolerance = 1.247e-06
Iteration 37: tolerance = 8.548e-07

GEE population-averaged model                   Number of obs     =    156,147
Group variable:                         id      Number of groups  =     14,692
Link:                               probit      Obs per group:
Family:                           binomial                    min =          1
Correlation:                  exchangeable                    avg =       10.6
                                                              max =         23
                                                Wald chi2(1)      =      24.18
Scale parameter:                         1      Prob > chi2       =     0.0000

                                     (Std. Err. adjusted for clustering on id)
------------------------------------------------------------------------------
             |             Semirobust
      status |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     Z_score |
         L1. |   .0335066   .0068144     4.92   0.000     .0201506    .0468626
             |
       _cons |  -1.658208   .0214663   -77.25   0.000    -1.700282   -1.616135
------------------------------------------------------------------------------
Seems this correct for you?

In my "Guiding Paper" they assess the Z-Score Model on its Pseudo R2. I know that the "normal" probit regeression Output shows me the Pseudo R2 and there is a way to calculate the Pseudo R2 for the xtprobit Panel Data Probit Regression. I know that the pseudo R2 is stored with e(r2 p).
How can I calculate the Pseudo R Squared for the Population Averaged Model?

Thank you very much for your support!