I used predicted variables from PCA for an EFA and want to implement my findings (6 factors) in a probit model.
What I did so far and what is planned:
- I cut my dataset into 4 sets
- I used PCA to trim down from my ~180 variables (I now have 8 components describing the items)
- I did not use the 8 components, however, realized the cut down items behind the components (around 50) to run an EFA on set 2
- After finding the underlying structure of 6 factors, I want to implement these findings on a third set to regress a probit model
- The last set of the 4 is for running the probit model.
I am somewhat stuck on how to implement the probit model.
I did use "predict fa1 fa2 ... fa11" to get the new variables from my factor I found via the FA. Through "mkmat...., mat(probitraw) obs nchar(1)", "mat probitfa = probitraw*fa" and "svmat probitfa, names( col )" I was able to implement the structure / factors onto my new set 3.
Now running probit on the list of variables found via this, plus some extra dummy variables, seems to have a problem.
It shows:
Note: 400 failures and 416 successes completely determined.
Research suggested that this most likely comes from collinearity within my data. Since I was already stuck on collinearity I went back to use an orthogonal rotation instead of an oblique rotation to minimize correlation between factors. At least that was the plan.
For this reason I ran vif (see below)
. vif, unc | ||
Variable | VIF | 1/VIF |
Factor3 | 9.12e+06 | 0.000000 |
Factor2 | 5.80e+06 | 0.000000 |
Factor5 | 2.73e+06 | 0.000000 |
Factor4 | 892711.06 | 0.000001 |
Factor6 | 662939.75 | 0.000002 |
Factor1 | 239988.48 | 0.000004 |
usa | 2.04 | 0.491378 |
interconti~l | 1.93 | 0.516836 |
market_based | 1.53 | 0.654417 |
bank_based | 1.23 | 0.810417 |
bank_type | 1.06 | 0.944274 |
outliers | 1.04 | 0.959722 |
eastern_eu~e | 1.01 | 0.993844 |
Mean VIF | 1.50e+06 | |
Now my question is, does it even make sense running a probit model on factors I found from an EFA.
I found an old post where Mr Clyde Schechter and Mr Richard Williams talked about collinearity being of no issue, but my VIF is immensly high.
On the one hand, I am worried that interpreting margins is not very sensible with such high correlation between variables.
On the other hand, the whole sense of my thesis was to show how a multitude of variables can easily be summarized into very few factors, which in turn show the likelihood of a firm being a buyer or a target in an acquisition scenario.
Additionally, are there other ways to work around this problem?
Could implementing interaction terms be helpful, or do they just cover a deeper problem?
Thank you in advance
Best,
Aaron
P.S. I am not sure what other info you might need, please do not hesitate to detail this.
0 Response to Probit model, collinearity from factor analysis data
Post a Comment