Hello everyone, I'm having some troubles with the following regression analysis.
As my last post did not get much of a reaction at all, and I was advised to be more specific, I'm trying again for a similar question. If I got any technicalities wrong, please tell me, I'm eager to get this right.
I'm using stata 13

My variables are as follows:
Dependent:
par30 - portfolio at risk > 30 (percentage of loans overdue more than 30 days)
Independent:
perfem - percentage of female borrowers
TAK - total assets (in $1000)
PSK - average portfolio size per borrower (in $1000)
MFIage = indicates if an MFI is new, young or mature (1 = 1-4 years old, 2 = 4-8 years old, 3 = 8+ years old)
  • I used a different variable beforehand (0 for new, +1 for each year the MFI is active) however I encountered huge problems due to linearity, so I decided to use this one
Group variable: numMFI
MFI = Microfinance Institution, also called financial service provider

Concerning the group variables, my data contains a variable called mfiname which has the individual name of each MFI so I did

Code:
. egen numMFI = group(mfiname)

. xtset numMFI fiscalyear
       panel variable:  numMFI (unbalanced)
        time variable:  fiscalyear, 2003 to 2012, but with gaps
                delta:  1 unit
As I want to do a linear regression analysis on my data I tried to follow this advice, so first I tested for linearity with -nlcheck- and as I wanted to have a look at it graphically aswell to include it in my thesis, I did another test following this advice.

Code:
. quietly xtreg par30 perfem TAK PSK MFIage
Code:
. nlcheck perfem

Nonlinearity test:

           chi2(  9) =   10.11
         Prob > chi2 =    0.3414
Code:
. nlcheck TAK

Nonlinearity test:

           chi2(  9) =    7.91
         Prob > chi2 =    0.5434
Code:
. nlcheck PSK

Nonlinearity test:

           chi2(  9) =   20.61
         Prob > chi2 =    0.0145
Code:
. nlcheck MFIage

Nonlinearity test:

           chi2(  1) =    0.13
         Prob > chi2 =    0.7229
This implies perfem, TAK and MFIage are linear, PSK however isn't.

Code:
. quietly regress par30 perfem TAK PSK MFIage

. predict r, resid
Then I checked linearity with the following command
Code:
. acprplot perfem , lowess

. acprplot TAK , lowess

. acprplot PSK , lowess

. acprplot MFIage , lowess
TAK and MFIage looked great, however
Code:
. acprplot perfem , lowess
Array

Code:
. acprplot PSK , lowess
Array

Code:
. kdensity PSK, normal
Array

Code:
. summarize PSK, detail

                             PSK
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .0266504       .0149792
 5%     .0484809       .0195255
10%      .061839       .0266504       Obs                 209
25%     .1161648       .0330702       Sum of Wgt.         209

50%     .2528287                      Mean            .431633
                        Largest       Std. Dev.      .4882636
75%     .5544006       2.138109
90%     1.126634       2.336616       Variance       .2384014
95%     1.406009        2.36487       Skewness       2.467618
99%     2.336616       3.398852       Kurtosis       11.24819
Now I have two questions

1. -nlcheck- indicated perfem was linear, however to me that doesn't look too great, any thoughts on that?
2. Any ideas on transforming PSK? I tried to log it on e which gave me the following results

Code:
. generate lnPSK = ln(PSK)

. quietly regress par30 perfem TAK lnPSK MFIage

. predict r, resid
(81 missing values generated)
Code:
. acprplot lnPSK , lowess
Array

Code:
. kdensity lnPSK, normal
Array

Which does look better I guess, however -nlcheck- still rejects the null

Code:
. quietly xtreg par30 perfem TAK lnPSK MFIage

. nlcheck lnPSK

Nonlinearity test:

           chi2(  9) =   20.58
         Prob > chi2 =    0.0146
Any ideas on how to properly transform PSK to get it linear, or is there no chance on doing this regression linearly with this model?
Thanks in advance!