Dear Statalist,

I have a model fitted using ml using a d0 Mata-based evaluator. Now, I am trying to calculate the robust errors using _robust developers' function. Unfortunately, given that I am working using d0 family (d0 it is necessary because I am working with a special case of a discrete choice model), when I try to calculate the score function in order to create the white/sandwich matrix robust for heteroscedasticity I got the following message: scores cannot be produced with method d0. Of course, it does totally make sense because I am not providing the ml with the analytical derivatives to calculate the score functions.

That being said, my question is if it's possible to recover the numerical approximations of the score functions that
ml calculates in order to use them as an input for _robust command to calculate "numerical" robust standard errors? This is also tremendously relevant in my case because I am planning to include analytical derivatives onto my model shortly (turning it into a d1 ml model), but I also want to have a way to check if my analytical results (both algebraical computations and code implementation) matches the ml numerical approximations of the score functions.

Here I listed a self-contained example using a conditional logit (old clogit still kicking!), using _robust command to match robust standard errors, which in this specific case, happens to be, the same thing as vce(cluster individual_var) (this is also shown below).

Just for the sake of concreteness, I want to state that my original problem is not a conditional logit (otherwise this post will be meaningless), but I prefer to exemplify my problem using it because both shared several critical points.




(1) Data Generation: I used Mata just because it's faster creating the data. The real betas are 0.5 and 2 for attributes x1 and x2, respectively as you can see from matrix "betas". Furthermore, 2000 individuals are facing 5 choice situations each.

Code:
clear all
set seed 157
set obs 2000
gen id = _n
local n_choices =5
expand `n_choices'
bys id : gen alternative = _n

local n_atrib = 2
forvalues i = 1/`n_atrib' {
    gen x`i' =  runiform(-2,2)
}

gen choice = . // needs to be created
matrix betas = (0.5 ,2)
global MY_panel = "id"


mata: 
betas =st_matrix("betas")
st_view(X = ., ., "x*")
st_view(panvar = ., ., st_global("MY_panel"))
npanels = max(panvar)
X_id =X,panvar

for(n=1; n <= npanels; ++n) { 
    x_n = select(X, (X_id[.,cols(X_id) ]:==n)) // selects X's for each individual n
    util_n =betas :* x_n
    util_sum =rowsum(util_n) 
    U_exp = exp(util_sum)
    p_i =     U_exp :/ colsum(U_exp)
    cum_p_i =runningsum(p_i)    
    rand_draws = J(rows(x_n),1,uniform(1,1)) 
    pbb_balance = rand_draws:<cum_p_i
    cum_pbb_balance = runningsum(pbb_balance)
    choice_n = (cum_pbb_balance:== J(rows(x_n),1,1))
    if (n==1) Y =choice_n    
    else       Y = Y \ choice_n    
}
        
new_data=.
st_view(new_data, ., "x* choice ")
new_data[.,cols(new_data)] = Y[.,1]

end


(2) clogit + _robust combo: Here, I show that, first of all, the DGP was correctly captured by clogit (given the obtained betas). Also, how robust standard error (third column) is equivalent to use cluster errors across individuals (fourth column), and even most importantly, how _robust developers function can be used to construct robust errors in clogit, (second column) which is my main goal with my own problem.

Code:
/* Robust/Cluster per ind */

// clogit: No robust
eststo  no_r :clogit choice x*  , group(id) nolog
// _robust
matrix D = e(V)
matrix b = e(b)
predict double _score , score
_robust _score , v(D) cluster(id) 
ereturn post b D, depn(choice) 
eststo  r_robust :ereturn display
// clogit: robust
eststo  robust: clogit choice x*, group(id) vce(robust) nolog
// clogit: cluster
eststo  cluster: clogit choice x*  , group(id) vce(cluster id) nolog
// All models
esttab  , label    b(a4) se   ///
     nonumbers mtitles("no_r" "r_robust"  "vce(robust)" "vce(cluster)" )  ///
      replace
------------------------------------------------------------------------------------
                             no_r        r_robust     vce(robust)    vce(cluster)   
------------------------------------------------------------------------------------
choice                                                                              
x1                         0.5137***       0.5137***       0.5137***       0.5137***
                        (0.03190)       (0.03158)       (0.03158)       (0.03158)   

x2                         1.9802***       1.9802***       1.9802***       1.9802***
                        (0.05672)       (0.05538)       (0.05538)       (0.05538)   
------------------------------------------------------------------------------------
Observations                10000                           10000           10000   
------------------------------------------------------------------------------------



(3) simple ml replication of clogit: Finally, I replicate the clogit results using a Mata based d0 evaluator, showing how standard errors match the first column of my last table (clogit with no standard error corrections)

Code:
mata:
    void RRM_d0(transmorphic scalar M, real scalar todo,
    real rowvector b, real scalar lnf,
    real rowvector g, real matrix H)
 {
    /*-----------------------------*/
    /*----variables declaration----*/
    /*-----------------------------*/
    
    real matrix panvar  
    real matrix paninfo 
    real scalar npanels

    real scalar n 
    real matrix Y 
    real matrix X 
    real matrix x_n 
    real matrix y_n 

    // variables creation
    Y = moptimize_util_depvar(M, 1)     
    X= moptimize_init_eq_indepvars(M,1)    

    st_view(panvar = ., ., st_global("MY_panel"))
    paninfo = panelsetup(panvar, 1)     
    npanels = panelstats(paninfo)[1] 
    lnfj = J(npanels, 1, 0) 

    for(n=1; n <= npanels; ++n) {
        x_n = panelsubmatrix(X, n, paninfo) 
        y_n = panelsubmatrix(Y, n, paninfo) 
        // Linear utility        
        U = rowsum(b:* x_n) 
        // exp() Utility
        U_exp = exp(U)
        // Probability 
        p_i = colsum((U_exp :* y_n))  / colsum(U_exp)
        // log probability
        lnfj[n] = ln(p_i)
         }
    // Sum over individuals (n)     
    lnf = moptimize_util_sum(M, lnfj)
}
end

global MY_panel id  // i is the identification variable per individuals
sort $MY_panel
ml model d0 RRM_d0() (clogit:  choice = x*, nocons) 
ml maximize ,nolog

. ml maximize ,nolog

                                                Number of obs     =     10,000
                                                Wald chi2(2)      =    1250.08
Log likelihood = -1582.7261                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .5137069   .0319027    16.10   0.000     .4511788     .576235
          x2 |   1.980155   .0567223    34.91   0.000     1.868981    2.091328
------------------------------------------------------------------------------



(4) Finally, the error...: Here is what was my original plan (where I get the error stated at the beginning).

Code:
matrix D = e(V)
matrix b = e(b)
predict double _score , score // here is where it fails.
_robust _score , v(D) cluster(id)
ereturn post b D, depn(choice) 
ereturn display


I do really hope this (longer than I expected) example could help to clarify what I am trying to do and where it failed specifically.
Thanks for reading
up to this point.
Sincerely,

Álvaro
Stata 16.1 MP
Win10