Background: We are studying a rural population of 375 malawian
households, and are trying to make an asset index based on a list with over 40 binary (and a few continuous) variables copied from the DHS questionaires for Malawi. Trying to conduct a PCA analysis on this data in STATA I got a really low first principal component value, which only accounts for only about 10-13% of the total variation. Trying the Kaiser-Meyer-Olkin Measure of sampling adequacy (KMO), i get an error and i am told that my matrix is singular.
I further wanted to clean and reduce the data i had to do a better analysis, and excluded
variables based on:
1. Whether their frequency was higher or lower than respectively 95% or 5% of the population
2. Whether there were were multicolinear variables, (correlation >90% with other variable), as
they probably explain the same variance.
3. If they were significantly (chi square test) correlated with less than 6 other variables
4. If they correlated less than 0,099 with majority of the other variables.
After this process i am left with 15 variables, and running the PCA now gives me the output of
- PrincipalComponent 1: 3.42364
- Eigenvalue: 1.93568
- Proportion of explained variance: 0.2282
- KMO: Overall | 0.7385 (according to VAM guide; Minimum acceptable value is a value of 0.6.)
- Bartlett test of sphericity: Chi-square= 983.032 Degrees of freedom = 105 p-value = 0.000
H0: variables are not intercorrelated
My questions then are:
1. Does the criteria i use for exclusion of variables sound reasonable, can i
consider anything else when choosing variables? for example look at the strength of the
significance level of the correlations instead of the amount of significant values?
2. How can i interpret if the proportion of explained variance is high enough, and also the
KMO? How important are these measures for the validity of the analysis?
3. I didn ́t rotate my data as it doesn ́t elevate my principal component score.Is this
accepted? They do it in other instructions i ́ve seen
4. Having in mind that doing a PCA analysis is a iterative process, how do i know when i
have the best possible result?
5. Are there still advantages to doing both PCA and a Factor analysis and
comparing the results?
Mehma
Related Posts with Asset/wealth index with PCA
How to find the amount of gaps in a natural number sequence?Dear Stata community, I have some oddly formatted data, as in the following example dataset: Code:…
Multi_events study commandGood afternoon all friends. I have a problem in running multi_events in STATA. My problem: I have an…
keep Data between certain datesHi, I want to drop all data that is outside of my range. I tried following code Code: drop if mont…
Sample size calculation for multiple arm designHello everyone, I hope you are all doing well. I have a friend that is designing a study evaluating …
Drop if 3 consecutive quarters of negative sales growthHey! I have some difficulties to write a proper code to clean my data. The goal is to keep only fir…
Subscribe to:
Post Comments (Atom)
0 Response to Asset/wealth index with PCA
Post a Comment