GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test

Hi everyone,

I am using a logit model (attached below) to investigate the impact of minority status of borrowers on the loan approval probability, but both the Pearson's chi2 and HL test indicated a poor gof.

So I have the following questions,

1) Is the poor gof caused by the large sample, which is in a size of 2,491,476 ? I think my model has already included a rich set of controls that are in appropriate forms because I followed the controls recent studies used.

2) Despite the poor gof from Pearson and HL, the "percent correctly predicted" of the model is around 87%, which is very high. Can I regard my model as very predictive even though the poor gof from Pearson and HL?

Thanks!
Lei

The following is the test result:

I used Pearson's chi2 to exam the gof of the model and got :

Number of observations = 2,491,476
Number of covariate patterns = 1,636,678
Pearson chi2(1636649) = 2.48e+06
Prob > chi2 = 0.0000

which indicates a poor gof for the model.

In addition, I used HL test to exam the gof and got:

Number of observations = 2,491,476
Number of groups = 10
Hosmer–Lemeshow chi2(8) = 260.64
Prob > chi2 = 0.0000

which also indicates a poor gof. But look at the table below, the observed and expected cell frequencies in each group are in very good agreement, at this point, I think the model's gof should be good.

Table collapsed on quantiles of estimated probabilities
+-----------------------------------------------------------------+
| Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total |
|-------+--------+--------+----------+--------+----------+--------|
| 1 | 0.6193 | 77335 | 77379.4 | 171813 | 171768.6 | 249148 |
| 2 | 0.7471 | 170945 | 172131.2 | 78204 | 77017.8 | 249149 |
| 3 | 0.8465 | 200800 | 198841.8 | 48346 | 50304.2 | 249146 |
| 4 | 0.8855 | 216880 | 216712.1 | 32268 | 32435.9 | 249148 |
| 5 | 0.9037 | 223495 | 223061.9 | 25652 | 26085.1 | 249147 |
|-------+--------+--------+----------+--------+----------+--------|
| 6 | 0.9166 | 227089 | 226821.4 | 22059 | 22326.6 | 249148 |
| 7 | 0.9275 | 229835 | 229762.4 | 19314 | 19386.6 | 249149 |
| 8 | 0.9378 | 232215 | 232373.8 | 16932 | 16773.2 | 249147 |
| 9 | 0.9492 | 234556 | 235019.0 | 14591 | 14128.0 | 249147 |
| 10 | 0.9900 | 237511 | 238557.9 | 11636 | 10589.1 | 249147 |
+-----------------------------------------------------------------+

The following is the logit model, with approval decision as the outcome variable, and a set of explanatory variables which are either dummy or continuous variables, there is no interaction or squared term:

logit approval income_w dti20 dti20_30 dti30_36 dti36_49 dti50_60 fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95 origination_2019 refinance minority female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen tract_owner_occupied_units tract_one_to_four_family_homes tract_median_age_of_housing_unit cra fhfa_index

Here is the sample data, I divided it into two parts, due to the variables number limited by dataex:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float approval long income_w float(dti20 dti20_30 dti30_36 dti36_49 dti50_60 fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95 origination_2019 refinance)
1 208 0 0 1 0 0 1 0 0 0 0 0 0 1 0
1 190 0 0 0 1 0 1 0 0 0 0 0 0 1 0
1 132 0 0 0 1 0 1 0 0 0 0 0 0 1 0
1 127 0 0 0 1 0 1 0 0 0 0 0 0 1 0
1 171 0 0 0 1 0 1 0 0 0 0 0 0 0 0
1 125 0 0 0 1 0 1 0 0 0 0 0 0 1 0
1 152 0 0 0 1 0 1 0 0 0 0 0 0 0 0
1 150 0 0 0 1 0 0 1 0 0 0 0 0 1 0
1 208 0 0 0 1 0 0 1 0 0 0 0 0 1 0
1 208 0 0 0 1 0 1 0 0 0 0 0 0 1 0
end

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(minority female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen) int(tract_owner_occupied_units tract_one_to_four_family_homes) byte tract_median_age_of_housing_unit float cra double fhfa_index
0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
0 0 0 0 0 0 1 46.07 13975 15386  8 0  5.11
0 0 0 0 0 0 1 46.07 13975 15386  8 0  4.47
0 0 0 0 0 0 1 46.07 13975 15386  8 0  5.11
0 0 0 0 0 0 1 11.43  6612  7636 12 0 11.99
0 0 0 0 0 0 1  3.55  6004  6742 12 0  5.76
0 1 0 0 0 0 1 34.96  6938  8788 13 0  6.11
end

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test
GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test

0 Response to GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test

Post a Comment

Home / Data Cleaning / Data management / Data Processing / GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test

0 Response to GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test

Post a Comment

Home / Data Cleaning / Data management / Data Processing / GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test
GOF of Logit Model: Pearson's chi2, Hosmer and Lemeshow's test