Background: We are studying a rural population of 375 malawian
households, and are trying to make an asset index based on a list with over 40 binary (and a few continuous) variables copied from the DHS questionaires for Malawi. Trying to conduct a PCA analysis on this data in STATA I got a really low first principal component value, which only accounts for only about 10-13% of the total variation. Trying the Kaiser-Meyer-Olkin Measure of sampling adequacy (KMO), i get an error and i am told that my matrix is singular.

I further wanted to clean and reduce the data i had to do a better analysis, and excluded
variables based on:
1. Whether their frequency was higher or lower than respectively 95% or 5% of the population
2. Whether there were were multicolinear variables, (correlation >90% with other variable), as
they probably explain the same variance.
3. If they were significantly (chi square test) correlated with less than 6 other variables
4. If they correlated less than 0,099 with majority of the other variables.
After this process i am left with 15 variables, and running the PCA now gives me the output of
- PrincipalComponent 1: 3.42364
- Eigenvalue: 1.93568
- Proportion of explained variance: 0.2282
- KMO: Overall | 0.7385 (according to VAM guide; Minimum acceptable value is a value of 0.6.)
- Bartlett test of sphericity: Chi-square= 983.032 Degrees of freedom = 105 p-value = 0.000
H0: variables are not intercorrelated

My questions then are:
1. Does the criteria i use for exclusion of variables sound reasonable, can i
consider anything else when choosing variables? for example look at the strength of the
significance level of the correlations instead of the amount of significant values?
2. How can i interpret if the proportion of explained variance is high enough, and also the
KMO? How important are these measures for the validity of the analysis?
3. I didn ́t rotate my data as it doesn ́t elevate my principal component score.Is this
accepted? They do it in other instructions i ́ve seen
4. Having in mind that doing a PCA analysis is a iterative process, how do i know when i
have the best possible result?
5. Are there still advantages to doing both PCA and a Factor analysis and
comparing the results?

Mehma