I think I found some behaviour of Stata 16 vs Stata 17 which I do not understand and is worrisome. In short, it seems inverting matrices in Stata 16 and Stata 17 leads to different results. This applies to a range of commands as for example regress.
Let me show the differences with the following example using the jasa2 dataset. I load the data, recast the variables as doubles (the variables are in floats, but it does not make a difference, just to be sure), calculate the cross-section averages for the variables c, p and y and regress those on c for the first cross-sectional unit (unit 111). I make sure that both run using version 16 (the entire do file is also attached):
Code:
set varabbrev on use http://fmwww.bc.edu/repec/bocode/j/jasa2.dta, clear recast double y c p *** balance data keep if year >=1962 drop if id == 124 *** gen csa by year, sort: egen m_c = mean(c) by year, sort: egen m_p = mean(p) by year, sort: egen m_y = mean(y) version 16: reg c m_y m_p m_c if id == 111, noconst
Code:
------------------------------------------------------------------------------ c | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- m_y | -.1771661 .1791667 -0.99 0.331 -.5436031 .1892708 m_p | -.450618 .0929712 -4.85 0.000 -.6407655 -.2604704 m_c | 1.013109 .1853195 5.47 0.000 .6340876 1.392129 ------------------------------------------------------------------------------
Code:
------------------------------------------------------------------------------ c | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- m_y | -.1771761 .1791672 -0.99 0.331 -.5436142 .1892619 m_p | -.4506153 .0929712 -4.85 0.000 -.6407627 -.260468 m_c | 1.013119 .1853201 5.47 0.000 .6340968 1.392141 ------------------------------------------------------------------------------
Then I looked a bit deeper into the problem. The correlation is the following:
Code:
corr y m_y m_p m_c if id == 111 | c m_y m_p m_c -------------+------------------------------------ c | 1.0000 m_y | 0.9879 1.0000 m_p | 0.3064 0.4234 1.0000 m_c | 0.9933 0.9972 0.4005 1.0000
Furthermore I compared results using my own partialling out program in mata (see further below) and the results differ as well. The matrix I am inverting has full rank though. To me it looks like the difference arises in the matrix inversion. I checked invsym, qrinv, pinv, cholinv and the difference still exists. I know that there have been updates to the mata routines using LAPAK by using the Intel Math Kernel Library. Is this the reason for the differences?
Why is this important? I am working a lot with models with cross-section dependence where the dependence is removed by cross-section averages. Obtaining different results using different versions of Stata is not good news at all! Is anyone able to replicate the results? Do I miss something very important? Any ideas and/or suggestions are welcome.
For further analysis, this is the mata partialling out program I mentioned above.
Code:
set matastrict off capture mata mata drop partialout() mata: function partialout ( string scalar X2_n, string scalar X1_n, string scalar touse, string scalar id_n, real matrix b, real matrix ranks) { real matrix X1 real matrix X2 real matrix X1_i real matrix X2_i real scalar id st_view(X2,.,tokens(X2_n),touse) st_view(X1,.,tokens(X1_n),touse) rk = 0 id = st_data(.,id_n,touse) index = panelsetup(id,1) ids = rows(index) running = 1 b = J(0,cols(X1),.) ranks = J(0,1,.) while (running<=ids) { X1_i = panelsubmatrix(X1,running,index) panelsubview(X2_i,X2,running,index) X1X1 = quadcross(X1_i,X1_i) X1X2 = quadcross(X1_i,X2_i) ranks = ranks \ (rank(X1X1):==cols(X1X1)) //partial out bi = quadcross(invsym(X1X1),X1X2) b = b \ bi' X2_i[.,.] = (X2_i - X1_i*bi) running++ } } end mata partialout("c_p","m_c m_p m_y","touse","id",bi=.,ranks=.) sum c_p
Code:
Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- c_p | 736 3.19e-10 .0606316 -.2358852 .2282257
Code:
Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- c_p | 736 -2.54e-10 .0606315 -.2358854 .2282266
Code:
c_p -.0261071 -.0406798 -.0227847 -.0078159 .0034 -.0011637
Code:
c_p -.02610678 -.04067976 -.02278482 -.00781614 .0034003 -.00116397
0 Response to Matrix inversion in Stata 16 and 17 leads to different results
Post a Comment