Background: We are studying a rural population of 375 malawian
households, and are trying to make an asset index based on a list with over 40 binary (and a few continuous) variables copied from the DHS questionaires for Malawi. Trying to conduct a PCA analysis on this data in STATA I got a really low first principal component value, which only accounts for only about 10-13% of the total variation. Trying the Kaiser-Meyer-Olkin Measure of sampling adequacy (KMO), i get an error and i am told that my matrix is singular.
I further wanted to clean and reduce the data i had to do a better analysis, and excluded
variables based on:
1. Whether their frequency was higher or lower than respectively 95% or 5% of the population
2. Whether there were were multicolinear variables, (correlation >90% with other variable), as
they probably explain the same variance.
3. If they were significantly (chi square test) correlated with less than 6 other variables
4. If they correlated less than 0,099 with majority of the other variables.
After this process i am left with 15 variables, and running the PCA now gives me the output of
- PrincipalComponent 1: 3.42364
- Eigenvalue: 1.93568
- Proportion of explained variance: 0.2282
- KMO: Overall | 0.7385 (according to VAM guide; Minimum acceptable value is a value of 0.6.)
- Bartlett test of sphericity: Chi-square= 983.032 Degrees of freedom = 105 p-value = 0.000
H0: variables are not intercorrelated
My questions then are:
1. Does the criteria i use for exclusion of variables sound reasonable, can i
consider anything else when choosing variables? for example look at the strength of the
significance level of the correlations instead of the amount of significant values?
2. How can i interpret if the proportion of explained variance is high enough, and also the
KMO? How important are these measures for the validity of the analysis?
3. I didn ́t rotate my data as it doesn ́t elevate my principal component score.Is this
accepted? They do it in other instructions i ́ve seen
4. Having in mind that doing a PCA analysis is a iterative process, how do i know when i
have the best possible result?
5. Are there still advantages to doing both PCA and a Factor analysis and
comparing the results?
Mehma
Related Posts with Asset/wealth index with PCA
calculating frequency of events by age-classHello stata-users, I need to create a two column dataset by age-class and number of events. I can d…
Counting Instance of a variable and returning averageI am a student and new to Stata. Any help is appreciated. I have a data set with age and weight. I …
change from row to collumn without reshapehi I have a long data. i would like that each categorie of one variable "class" became a collumn, wi…
Creating an expression that tells marginal impact of moving one independent variable to another independent variable.I have been working on a project over the last few months. The idea of this project is looking at ML…
Dropping certain observations of a variable in a specified rangeI want to drop only certain observations of l_growth for each naic that are between 1997Q1 and 2009Q…
Subscribe to:
Post Comments (Atom)
0 Response to Asset/wealth index with PCA
Post a Comment