Dear Statalist users,

I initially created a post here, where I was having difficulties understanding the basics of LPA syntax in Stata. Following the post here, I was able to replicate Masyn's (2013) LPA using startvalues(randompr, draws(5) seed(15)). I applied the same starting values uniformly across my 6 classes with 4 different model restrictions. My BIC statistics results are as follows:

class BIC class-invariant, diagonal BIC class-varying, diagonal BIC class-invariant, unrestricted BIC class varying, unrestricted
1 6536.391 6536.391 5726.355 5726.355
2 6044.384 5982.513 DRE 5648.8
3 5923.718 5917.452 5563.118 5620.018
4 5915.317 5820.818 5587.027 5741.81
5 5898.285 5838.543 5731.829 5756.148
6 5843.54 5817.436 5259.08 5927.461
Where DRE stands for discontinuous region encountered.
I noticed what looks like an erratic behaviour of my BIC values for the class invariant, unrestricted model (column four), which is probably due to the stringent assumption of being class-invariant. I've also noticed that as I increase the number of classes, Stata struggles quite a bit in providing results for the same model specification.

Now, comment # 7 in the same post recommends using startvalues(randompr, draws(50) seed(15)) emopts(iterate(10)) as one hits 5+ latent classes. I applied this criteria uniformly across my 6 class models with 4 different restrictions. My results look as follows:

class BIC class-invariant, diagonal BIC class-varying, diagonal BIC class-invariant, unrestricted BIC class varying, unrestricted
1 6536.391 6536.391 5726.355 5726.355
2 6044.384 5982.513 5677.851 5648.8
3 5923.718 5932.882 5698.263 5620.018
4 5915.317 5820.818 5715.92 5773.844
5 5898.285 5846.176 5750.381 5780.88
6 5843.54 5818.873 5772.309 5924.519
At this point, I am very confused about when to use a set of starting values or another. Every time I use a different set my class profiles change markedly, and I am trying to avoid the trap of choosing the starting values that best fit my research expectations.

I'm leaning towards using Masyn's starting values, as in my first table. It just seems like a standard I can follow. But if anyone has some insights on this topic, I would be very grateful to discuss. Many thanks.

P.S. I am aware of the "gsem estimation options" document from the Stata manual. Unfortunately, I could not solve my problem after reading it.