Probit model, collinearity from factor analysis data

Hello everybody,

I used predicted variables from PCA for an EFA and want to implement my findings (6 factors) in a probit model.

What I did so far and what is planned:

I cut my dataset into 4 sets
I used PCA to trim down from my ~180 variables (I now have 8 components describing the items)
I did not use the 8 components, however, realized the cut down items behind the components (around 50) to run an EFA on set 2
After finding the underlying structure of 6 factors, I want to implement these findings on a third set to regress a probit model
The last set of the 4 is for running the probit model.

Now:
I am somewhat stuck on how to implement the probit model.
I did use "predict fa1 fa2 ... fa11" to get the new variables from my factor I found via the FA. Through "mkmat...., mat(probitraw) obs nchar(1)", "mat probitfa = probitraw*fa" and "svmat probitfa, names( col )" I was able to implement the structure / factors onto my new set 3.

Now running probit on the list of variables found via this, plus some extra dummy variables, seems to have a problem.

It shows:

Note: 400 failures and 416 successes completely determined.

Research suggested that this most likely comes from collinearity within my data. Since I was already stuck on collinearity I went back to use an orthogonal rotation instead of an oblique rotation to minimize correlation between factors. At least that was the plan.

For this reason I ran vif (see below)

. vif, unc
Variable	VIF	1/VIF

Factor3	9.12e+06	0.000000
Factor2	5.80e+06	0.000000
Factor5	2.73e+06	0.000000
Factor4	892711.06	0.000001
Factor6	662939.75	0.000002
Factor1	239988.48	0.000004
usa	2.04	0.491378
interconti~l	1.93	0.516836
market_based	1.53	0.654417
bank_based	1.23	0.810417
bank_type	1.06	0.944274
outliers	1.04	0.959722
eastern_eu~e	1.01	0.993844

Mean VIF	1.50e+06

Now my question is, does it even make sense running a probit model on factors I found from an EFA.
I found an old post where Mr Clyde Schechter and Mr Richard Williams talked about collinearity being of no issue, but my VIF is immensly high.

On the one hand, I am worried that interpreting margins is not very sensible with such high correlation between variables.
On the other hand, the whole sense of my thesis was to show how a multitude of variables can easily be summarized into very few factors, which in turn show the likelihood of a firm being a buyer or a target in an acquisition scenario.

Additionally, are there other ways to work around this problem?
Could implementing interaction terms be helpful, or do they just cover a deeper problem?

Thank you in advance

Best,

Aaron

P.S. I am not sure what other info you might need, please do not hesitate to detail this.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Probit model, collinearity from factor analysis data
Probit model, collinearity from factor analysis data

0 Response to Probit model, collinearity from factor analysis data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Probit model, collinearity from factor analysis data Probit model, collinearity from factor analysis data

Related Posts with Probit model, collinearity from factor analysis data

0 Response to Probit model, collinearity from factor analysis data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Probit model, collinearity from factor analysis data
Probit model, collinearity from factor analysis data