Dependent variable Y2 has four categories 1, 2, 3, 4
Y2 is simply created by dividing "3" category in Y1 into 3 and 4.
The code with Y1
Code:
cmp (Y1=X C, iia) (X=Z C) if C<100, ind($cmp_mprobit $cmp_cont) vce(cluster D)
Code:
cmp (Y2=X C, iia) (X=Z C) if C<100, ind($cmp_mprobit $cmp_cont) vce(cluster D)
Fitting full model.
Likelihoods for 522614 observations involve cumulative normal distributions above dimension 2.
Using ghk2() to simulate them. Settings:
Sequence type = halton
Number of draws per observation = 1446
Include antithetic draws = no
Scramble = no
Prime bases = 2 3
Each observation gets different draws, so changing the order of observations in the data set would change the results.
Likelihoods for 522614 observations involve cumulative normal distributions above dimension 2.
Using ghk2() to simulate them. Settings:
Sequence type = halton
Number of draws per observation = 1446
Include antithetic draws = no
Scramble = no
Prime bases = 2 3
Each observation gets different draws, so changing the order of observations in the data set would change the results.
I have a few questions.
First, when it says "Likelihoods for 522614 observations involve cumulative normal distributions above dimension 2", is it saying "above dimension 2" because Y2 has 4 categories? So if I don't want to spend this long time for running one regression, should I stick to at most 3 categories, like Y1 ?
Second, it says "Each observation gets different draws, so changing the order of observations in the data set would change the results." Then is it safe to believe the regression results?
Third, I read this from help cmp.
"If the estimation problem requires the GHK algorithm (see above), change the number of draws per observation in the simulation sequence using the ghkdraws() option. By default, cmp uses twice the square root of the number of observations for which the GHK algorithm is needed, i.e., the number of observations that are censored in at least three equations. Raising simulation accuracy by increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision. On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision with remarkably few draws per observations--as few as 5 when the sample size is 10,000 (Cappellari and Jenkins 2003). And taking more draws can also greatly extend execution time."
Sentence 1 "increasing the number of draws is sometimes necessary for convergence and can even speed it by improving search precision."
Sentence 2 "On the other hand, especially when the number of observations is high, convergence can be achieved, at some loss in precision, with remarkably few draws per observations (...) And taking more draws can also greatly extend execution time."
Sentence 1 is saying increase # in ghkdraws(#) to speed it up, and Sentence 2 is saying decrease it. Can I reconcile these two as "When N is big, choose low #, when N is small, choose high #"?
Also, as the guideline says, when 5 is enough for 10,000 observations, then will ghkdraws(5) also be enough for my 522614 observations? If yes, why is cmp using 1446 draws per observations by default?
0 Response to cmp for instrumental multinomial probit when there are many categories
Post a Comment