Background: We are studying a rural population of 375 malawian
households, and are trying to make an asset index based on a list with over 40 binary (and a few continuous) variables copied from the DHS questionaires for Malawi. Trying to conduct a PCA analysis on this data in STATA I got a really low first principal component value, which only accounts for only about 10-13% of the total variation. Trying the Kaiser-Meyer-Olkin Measure of sampling adequacy (KMO), i get an error and i am told that my matrix is singular.
I further wanted to clean and reduce the data i had to do a better analysis, and excluded
variables based on:
1. Whether their frequency was higher or lower than respectively 95% or 5% of the population
2. Whether there were were multicolinear variables, (correlation >90% with other variable), as
they probably explain the same variance.
3. If they were significantly (chi square test) correlated with less than 6 other variables
4. If they correlated less than 0,099 with majority of the other variables.
After this process i am left with 15 variables, and running the PCA now gives me the output of
- PrincipalComponent 1: 3.42364
- Eigenvalue: 1.93568
- Proportion of explained variance: 0.2282
- KMO: Overall | 0.7385 (according to VAM guide; Minimum acceptable value is a value of 0.6.)
- Bartlett test of sphericity: Chi-square= 983.032 Degrees of freedom = 105 p-value = 0.000
H0: variables are not intercorrelated
My questions then are:
1. Does the criteria i use for exclusion of variables sound reasonable, can i
consider anything else when choosing variables? for example look at the strength of the
significance level of the correlations instead of the amount of significant values?
2. How can i interpret if the proportion of explained variance is high enough, and also the
KMO? How important are these measures for the validity of the analysis?
3. I didn ́t rotate my data as it doesn ́t elevate my principal component score.Is this
accepted? They do it in other instructions i ́ve seen
4. Having in mind that doing a PCA analysis is a iterative process, how do i know when i
have the best possible result?
5. Are there still advantages to doing both PCA and a Factor analysis and
comparing the results?
Mehma
Related Posts with Asset/wealth index with PCA
Minimum or maximum from a set contained in global macroHi, I would like to create a local macro that would contain only the maximum value of a set contain…
Running svy and clogit and get "the group() variable is not nested within the final stage sampling units" errorHi, I am working with National Inpatient Sample (NIS). Considering the survey design, I am running …
Availability of PDF format for graph exportDear All, Stata's electronic help files for graph export mention availability of different file for…
How can I copy the entire Stata code *.do I'm running in an excel fileHi, I am running multiple simulations per day and I would like to get all necessary information enc…
Reporting format for Stata's own commandsDear All, I've noticed just now that some of Stata's own commands report the output diagnostics usi…
Subscribe to:
Post Comments (Atom)
0 Response to Asset/wealth index with PCA
Post a Comment