Dear Statalist users,

I am trying to fit a multilevel logistic regression model with a three-level structure, but it takes a lot of time for Stata to produce results, even for the simplest null or empty multilevel model, or to get results at all. I wonder if, with the explanations below, you could give me some orientations about what could be wrong, how I could speed up the computation procedure and what is the interpretation of the Stata error messages that I report.

I am working with PISA 2015 data. It is a hierarchical dataset in which individuals (n = 200,501) are nested in schools (n = 8,482) and schools are nested in countries (n = 32). Thus, the dataset has three levels: individuals, schools, countries. The dependent variable is university expectations, coded 1 if respondent expects to achieve a university diploma and 0 if not.

I use 12 independent variables at the individual level: 'migrante4' (categorical); 'repeat3' (categorical); 'PV1MATH' (continuous); 'iscedlevel' (categorical); 'iscedor' (categorical); 'edupadres3' (categorical); 'hisei' (continuous); 'indexcult' (continuous); 'sexo' (binary); 'langhome' (binary); 'AGE' (continuous).

There are no independent variables at the school level and my plan is to use independent variables at the country level. I am trying increasingly complex multilevel specifications with the idea of eventually fitting a random-intercept, random-slopes model with an interaction at the individual level.

I am using Stata 14 MP 8-core on a computer with Windows 10, an Intel i3 processor and 8 GB RAM. I have tried multilevel commands 'melogit', 'xtlogit' and 'meqrlogit' and I only get results with the latter. Moreover, the latter does not allow weights, which is an issue with PISA data because they are necessary to get correct point estimates and unbiased standard errors. An additional concern is that PISA requires using Balanced Replication Weights (BRR weights) and neither 'melogit' nor 'xtlogit' seem to be compatible with this type of weights.

I proceed as statistics handbook suggest. First, I fit the null model (only the dependent variable accounting for the hierarchical structure of the data; no independent variables). I try fitting the model using 'melogit' with and without weights (a pweight). Using 'melogit' with pweights entails a lot of processing time (more than 30 minutes) and I end by interrupting the execution of the command (Stata does not show more than the value of the log likelihood and it gets stuck in "Refining starting values"). Second, I try fitting the same model with 'melogit' but omitting the pweights. This time I get an error message saying "initial values not feasible". Finally, I use 'meqrlogit' (again without pweights). Using this command, I get the results, but it also takes a lot of time. These are the lines of code that I execute for this empty model

Code:
 melogit expectativas [pweight=W_FSTUWT] || CNT: || CNTSCHID:
Code:
 melogit expectativas || CNT: || CNTSCHID:
Code:
 meqrlogit expectativas || CNT: || CNTSCHID:
In a second step, I fit a random-intercept model in which I only include one independent variable (migrant background) and I account for the three-level structure. I try with 'melogit and 'meqrlogit' commands. Only the latter produces results, but it takes a lot of processing time, too. I have also tried fitting a two-level random intercepts model with individuals nested in countries (omitting the nesting in schools) to see if it takes less time to fit the model. It does, but Stata still requires a lot of processing time. This is the syntax that I execute

Code:
 melogit expectativas i.migrante4 || CNT: || CNTSCHID:
Code:
 meqrlogit expectativas i.migrante4 || CNT: || CNTSCHID:
Code:
 meqrlogit expectativas i.migrante4 || CNT:
In the third step, I try fitting a random-intercept model including the full set of independent variables at the individual level. I directly use 'meqrlogit' given that it is the most efficient command. I execute the following syntax and I get the following error message from Stata:

Code:
 meqrlogit expectativas ib1.migrante4 ib1.repeat3 c.PV1MATH c.cursorel ib2.iscedlevel ib1.iscedor ib2.edupadres3 c.hisei c.indexcult i.sexo ib1.langhome c.AGE
st_data(): 3900 unable to allocate real <tmp>[200501, 22]
_xtgm_setup_u(): - function returned error
_xtgm_setup_st(): - function returned error
<istmt>: function returned error

I tried fitting a more parsimonious model including only some of the independent variables and excluding others to try to make the computation process lighter. I get the same error message 3900 as before. It seems that my computer cannot provide enough memory for such a complex or demanding model in terms of observations, but I am not sure if this is the right interpretation of this error message. Things do not improve if I only consider two nesting levels (individuals and countries) because I get the same error message.

To try to determine whether some of the variables are causing this problem, I tried including them stepwisely (adding them one by one in an increasingly complex model). The random intercept model converges when I include variables 'migrante4', 'repeat3' 'PV1MATH', 'hisei_rec' and 'indexcult'. However, when I add to the model 'iscedlevel', 'iscedor', 'edupadres3', 'sexo', 'langhome' and 'AGE' (separately, not all of them simultaneously), I get again the same error message '3900 unable to allocate real <tmp>[200501, 22]'. I first thought that this is because 'iscedlevel' and 'iscedor' are too unbalanced (too many observations in one category, too few in the others) but this does not happen with the rest of "problematic' variables ('edupadres', 'sexo', etc.).

Thank you very much for taking time to read the post, to think of the question and to answer it.