Dear all,
I have some general questions on Latent Class Analysis that are probably not really Stata-specific, but maybe somebody could provide some ideas on that nevertheless.
I am using the user-written -lclogit2- module by Hong il Yoo (2019) to conduct a latent class analysis on data derived from a choice experiment.
My problem is that I have a lot of data (1,000 respondents, 16 choice situations per respondent) and a lot of variables and obviously I cannot include all of them, as the model won't converge.
It's 8 predictor variables specified in the rand() argument, and then many potential class membership variables (I would be interested in around 15 variables).
I already estimated different models with varying membership variables and found that some relevant socio-demographic variables such as gender seem to have no significant effect for 2,3 or 4 classes.
1. question:
I am really not sure whether I should now exclude the insignificant variables from the model, or whether I should keep it in there.
I've seen different approaches in different papers, and some researchers seem to include only those variables that show significant effects in the final model, while others report also some insignificant ones. Of course I have some research hypotheses on effects, but it seems I won't be able to run a model with ALL potentially relevant variables, so how can I be sure that a variable would be significant or insignificant in that context, if I can only estimate a model with a varying selection of variables?
2. question:
I have tried to identify the best model in terms of number of classes by comparing the model fit in terms of information criteria (AIC, CAIC, BIC) for 2 to 7 classes according to the procedure described by Pacifico & Yoo (2013). If I include many membership variables, I get the error message: "convergence not achieved".
If I include less variables, the information criteria look best for the 7 class model (probably more classes would even improve the results, but I haven't tested that yet). Model estimation becomes already difficult for 5 classes and I don't achieve convergence, and I don't think that a 7 class solution would be feasible to describe.
Do you have any suggestion on how to deal with that? I tried to vary the -seed- for estimation but it feels a lot like trial and error and I don't really have a strategy.
I have read that the number of classes might get overestimated due to local maxima. But I am not really sure on how to identify whether this is a problem in my case and how to avoid that.
3. question:
If I leave out some (potentially relevant) membership variables from the estimation, would it be possible to somehow include them in the classes later on?
Sorry for these general questions!
I appreciate any suggestions on that.
Thanks a lot in advance!
Related Posts with General questions on Latent Class Analysis / no. of variables / model fit
Different buttons presented for window stopbox rusure depending on operating system - any way to standardize them?Hello, I wrote some stata .do files that use window stopbox rusure and handle the responses. In Win…
Using ascol command to calculate weekly returns using CRSP daily returnsI tried to convert daily stock returns to weekly returns for panel data with firm, date, and returns…
Error r(321) after trying to conduct p for trend analysisHello Stata Listers, Thankyou for reading my query! I currently am using a dataset with 52 variable…
Creating a variable as a matrix of five othersHi, My dataset has variables such as province, and five other items (club church library hospital s…
error in reshapingi have the data ----------------------- copy starting from the next line ----------------------- Co…
Subscribe to:
Post Comments (Atom)
0 Response to General questions on Latent Class Analysis / no. of variables / model fit
Post a Comment