Dependent variable Y1 has three categories 1, 2, 3

Dependent variable Y2 has four categories 1, 2, 3, 4

Y2 is simply created by dividing "3" category in Y1 into 3 and 4.

The code with Y1

Code:
cmp (Y1=X C, iia) (X=Z C) if C<100, ind($cmp_mprobit $cmp_cont) vce(cluster D)
ran quickly without error, whereas the following code

Code:
cmp (Y2=X C, iia) (X=Z C) if C<100, ind($cmp_mprobit $cmp_cont) vce(cluster D)
throws me the following:

Fitting full model.
Likelihoods for 522614 observations involve cumulative normal distributions above dimension 2.
Using ghk2() to simulate them. Settings:
Sequence type = halton
Number of draws per observation = 1446
Include antithetic draws = no
Scramble = no
Prime bases = 2 3
Each observation gets different draws, so changing the order of observations in the data set would change the results.
and it is taking forever to run.

I have a few questions.

First, when it says "Likelihoods for 522614 observations involve cumulative normal distributions above dimension 2", is it saying "above dimension 2" because Y2 has 4 categories? So if I don't want to spend this long time for running one regression, should I stick to at most 3 categories, like Y1 ?

Second, it says "Each observation gets different draws, so changing the order of observations in the data set would change the results." Then is it safe to believe the regression results?

Third, I read this from help cmp.

"If the estimation problem requires the GHK algorithm (see above), change the number of draws per observation in the simulation sequence using the ghkdraws() option. By default, cmp uses twice the square root of the number of observations for which the GHK algorithm is needed, i.e., the number of observations that are censored in at least three equations. Raising simulation accuracy by increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision. On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision with remarkably few draws per observations--as few as 5 when the sample size is 10,000 (Cappellari and Jenkins 2003). And taking more draws can also greatly extend execution time."
How can I reconcile

Sentence 1 "increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision."
and

Sentence 2 "On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision, with remarkably few draws per observations (...) And taking more draws can also greatly extend execution time."
?

Sentence 1 is saying increase # in ghkdraws(#) to speed it up, and Sentence 2 is saying decrease it. Can I reconcile these two as "When N is big, choose low #, when N is small, choose high #"?

Also, as the guideline says, when 5 is enough for 10,000 observations, then will ghkdraws(5) also be enough for my 522614 observations? If yes, why is cmp using 1446 draws per observations by default?