General questions on Latent Class Analysis / no. of variables / model fit

Dear all,

I have some general questions on Latent Class Analysis that are probably not really Stata-specific, but maybe somebody could provide some ideas on that nevertheless.

I am using the user-written -lclogit2- module by Hong il Yoo (2019) to conduct a latent class analysis on data derived from a choice experiment.

My problem is that I have a lot of data (1,000 respondents, 16 choice situations per respondent) and a lot of variables and obviously I cannot include all of them, as the model won't converge.
It's 8 predictor variables specified in the rand() argument, and then many potential class membership variables (I would be interested in around 15 variables).

I already estimated different models with varying membership variables and found that some relevant socio-demographic variables such as gender seem to have no significant effect for 2,3 or 4 classes.

1. question:
I am really not sure whether I should now exclude the insignificant variables from the model, or whether I should keep it in there.
I've seen different approaches in different papers, and some researchers seem to include only those variables that show significant effects in the final model, while others report also some insignificant ones. Of course I have some research hypotheses on effects, but it seems I won't be able to run a model with ALL potentially relevant variables, so how can I be sure that a variable would be significant or insignificant in that context, if I can only estimate a model with a varying selection of variables?

2. question:
I have tried to identify the best model in terms of number of classes by comparing the model fit in terms of information criteria (AIC, CAIC, BIC) for 2 to 7 classes according to the procedure described by Pacifico & Yoo (2013). If I include many membership variables, I get the error message: "convergence not achieved".
If I include less variables, the information criteria look best for the 7 class model (probably more classes would even improve the results, but I haven't tested that yet). Model estimation becomes already difficult for 5 classes and I don't achieve convergence, and I don't think that a 7 class solution would be feasible to describe.
Do you have any suggestion on how to deal with that? I tried to vary the -seed- for estimation but it feels a lot like trial and error and I don't really have a strategy.
I have read that the number of classes might get overestimated due to local maxima. But I am not really sure on how to identify whether this is a problem in my case and how to avoid that.

3. question:
If I leave out some (potentially relevant) membership variables from the estimation, would it be possible to somehow include them in the classes later on?

Sorry for these general questions!
I appreciate any suggestions on that.

Thanks a lot in advance!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / General questions on Latent Class Analysis / no. of variables / model fit
General questions on Latent Class Analysis / no. of variables / model fit

0 Response to General questions on Latent Class Analysis / no. of variables / model fit

Post a Comment

Home / Data Cleaning / Data management / Data Processing / General questions on Latent Class Analysis / no. of variables / model fit General questions on Latent Class Analysis / no. of variables / model fit

0 Response to General questions on Latent Class Analysis / no. of variables / model fit

Post a Comment

Home / Data Cleaning / Data management / Data Processing / General questions on Latent Class Analysis / no. of variables / model fit
General questions on Latent Class Analysis / no. of variables / model fit