Here are four ways to estimate the same model (I think):
Code:

xtreg absent ib(2).classtype ib("k").gradenum i.schid, re xtreg absent ib(2).classtype ib("k").gradenum i.schid, re mle mixed absent ib(2).classtype ib("k").gradenum i.schid || stdntid: mixed absent ib(2).classtype ib("k").gradenum i.schid || _all: R.stdntid
Version 1 runs in 2 seconds. Version 2 runs in 10 seconds. Version 3 takes 50 seconds to return the same result as version 2. Version 4 just spins without returning a result.

What accounts for these differences? I'm especially confused about why version 3 takes so much longer than version 2, and why version 4 doesn't finish when version 3 does.