I am trying to fit a multilevel logistic regression model with a three-level structure, but it takes a lot of time for Stata to produce results, even for the simplest null or empty multilevel model, or to get results at all. I wonder if, with the explanations below, you could give me some orientations about what could be wrong, how I could speed up the computation procedure and what is the interpretation of the Stata error messages that I report.
I am working with PISA 2015 data. It is a hierarchical dataset in which individuals (n = 200,501) are nested in schools (n = 8,482) and schools are nested in countries (n = 32). Thus, the dataset has three levels: individuals, schools, countries. The dependent variable is university expectations, coded 1 if respondent expects to achieve a university diploma and 0 if not.
I use 12 independent variables at the individual level: 'migrante4' (categorical); 'repeat3' (categorical); 'PV1MATH' (continuous); 'iscedlevel' (categorical); 'iscedor' (categorical); 'edupadres3' (categorical); 'hisei' (continuous); 'indexcult' (continuous); 'sexo' (binary); 'langhome' (binary); 'AGE' (continuous).
There are no independent variables at the school level and my plan is to use independent variables at the country level. I am trying increasingly complex multilevel specifications with the idea of eventually fitting a random-intercept, random-slopes model with an interaction at the individual level.
I am using Stata 14 MP 8-core on a computer with Windows 10, an Intel i3 processor and 8 GB RAM. I have tried multilevel commands 'melogit', 'xtlogit' and 'meqrlogit' and I only get results with the latter. Moreover, the latter does not allow weights, which is an issue with PISA data because they are necessary to get correct point estimates and unbiased standard errors. An additional concern is that PISA requires using Balanced Replication Weights (BRR weights) and neither 'melogit' nor 'xtlogit' seem to be compatible with this type of weights.
I proceed as statistics handbook suggest. First, I fit the null model (only the dependent variable accounting for the hierarchical structure of the data; no independent variables). I try fitting the model using 'melogit' with and without weights (a pweight). Using 'melogit' with pweights entails a lot of processing time (more than 30 minutes) and I end by interrupting the execution of the command (Stata does not show more than the value of the log likelihood and it gets stuck in "Refining starting values"). Second, I try fitting the same model with 'melogit' but omitting the pweights. This time I get an error message saying "initial values not feasible". Finally, I use 'meqrlogit' (again without pweights). Using this command, I get the results, but it also takes a lot of time. These are the lines of code that I execute for this empty model
Code:
melogit expectativas [pweight=W_FSTUWT] || CNT: || CNTSCHID:
Code:
melogit expectativas || CNT: || CNTSCHID:
Code:
meqrlogit expectativas || CNT: || CNTSCHID:
Code:
melogit expectativas i.migrante4 || CNT: || CNTSCHID:
Code:
meqrlogit expectativas i.migrante4 || CNT: || CNTSCHID:
Code:
meqrlogit expectativas i.migrante4 || CNT:
Code:
meqrlogit expectativas ib1.migrante4 ib1.repeat3 c.PV1MATH c.cursorel ib2.iscedlevel ib1.iscedor ib2.edupadres3 c.hisei c.indexcult i.sexo ib1.langhome c.AGE
_xtgm_setup_u(): - function returned error
_xtgm_setup_st(): - function returned error
<istmt>: function returned error
I tried fitting a more parsimonious model including only some of the independent variables and excluding others to try to make the computation process lighter. I get the same error message 3900 as before. It seems that my computer cannot provide enough memory for such a complex or demanding model in terms of observations, but I am not sure if this is the right interpretation of this error message. Things do not improve if I only consider two nesting levels (individuals and countries) because I get the same error message.
To try to determine whether some of the variables are causing this problem, I tried including them stepwisely (adding them one by one in an increasingly complex model). The random intercept model converges when I include variables 'migrante4', 'repeat3' 'PV1MATH', 'hisei_rec' and 'indexcult'. However, when I add to the model 'iscedlevel', 'iscedor', 'edupadres3', 'sexo', 'langhome' and 'AGE' (separately, not all of them simultaneously), I get again the same error message '3900 unable to allocate real <tmp>[200501, 22]'. I first thought that this is because 'iscedlevel' and 'iscedor' are too unbalanced (too many observations in one category, too few in the others) but this does not happen with the rest of "problematic' variables ('edupadres', 'sexo', etc.).
Thank you very much for taking time to read the post, to think of the question and to answer it.
0 Response to Multilevel binomial logistic regression model with three levels does not converge
Post a Comment