Hello,

I am running a simple linear regression model and I found that 3 of my variables (ssr1 ssr2 ssr3) were highly colinear (a correlation above 0.85). When I performed a binary transformation to these variables in order to bypass this, they did stop being multicolinear but whenever I attempt to include all 3 of them in my baseline regression, the coefficients don't make sense (aka: their sign is different from when they're included alone).

Code:
 corr ssr1 ssr2 ssr3
(obs=4,544)

         |    ssr1      ssr2     ssr3
-------------+---------------------------
    ssr1 |   1.0000
    ssr2 |   0.8794   1.0000
    ssr3 |   0.9855   0.8725   1.0000


 corr dssr1 dssr2 dssr3
(obs=4,544)


         |    dssr1     dssr2    dssr3
-------------+---------------------------
   dssr1 |   1.0000
   dssr2 |   0.2647   1.0000
   dssr3 |   0.4837   0.3100   1.0000



reg y abs x1 x2 x3 dssr1 dssr2 dssr3

      Source |       SS           df       MS      Number of obs   =     4,389
-------------+----------------------------------   F(7, 4381)      =   3369.98
       Model |  10528.9907         7  1504.14153   Prob > F        =    0.0000
    Residual |  1955.39803     4,381  .446336003   R-squared       =    0.8434
-------------+----------------------------------   Adj R-squared   =    0.8431
       Total |  12484.3887     4,388  2.84512049   Root MSE        =    .66808

------------------------------------------------------------------------------
    y        |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     abs     |  -.3938718   .0589501    -6.68   0.000    -.5094438   -.2782998
     x1      |  -.4061771   .0108896   -37.30   0.000    -.4275263    -.384828
     x2      |   1.449583   .0119303   121.50   0.000     1.426194    1.472972
     x3      |  -.0211118   .0013578   -15.55   0.000    -.0237738   -.0184497
     dssr1   |  -.2650344   .0292786    -9.05   0.000    -.3224352   -.2076336
     dssr2   |   -.065791   .0288065    -2.28   0.022    -.1222664   -.0093157
     dssr3   |   .0849137    .025804     3.29   0.001     .0343249    .1355025
     _cons   |   1.411668   .0331324    42.61   0.000     1.346712    1.476625
------------------------------------------------------------------------------


reg y abs x1 x2 x3 dssr1

      Source |       SS           df       MS      Number of obs   =     4,389
-------------+----------------------------------   F(5, 4383)      =   4702.47
       Model |  10522.8053         5  2104.56107   Prob > F        =    0.0000
    Residual |  1961.58339     4,383  .447543553   R-squared       =    0.8429
-------------+----------------------------------   Adj R-squared   =    0.8427
       Total |  12484.3887     4,388  2.84512049   Root MSE        =    .66899

------------------------------------------------------------------------------
    y        |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     abs     |  -.4119729   .0573253    -7.19   0.000    -.5243595   -.2995863
     x1      |  -.4062483   .0108303   -37.51   0.000    -.4274811   -.3850155
     x2      |   1.451485   .0118196   122.80   0.000     1.428312    1.474657
     x3      |  -.0208816   .0012887   -16.20   0.000    -.0234082    -.018355
     dssr1   |   .0174783   .0283905    -9.42   0.000    -.3231379   -.2118186
     _cons   |   1.412287   .0328553    42.99   0.000     1.347874      1.4767
------------------------------------------------------------------------------
Is it possible that there is still "lingering" multicollinearity from the original variable? What should I do in this case?

I was attempting to do some diagnostics by trying to regress abs (the most likely culprit) on the 3 of them, perhaps to check whether the R^2 was too high, if these 3 explain too much.

Code:
reg abs dssr1 dssr2 dssr3

      Source |       SS           df       MS      Number of obs   =     4,389
-------------+----------------------------------   F(3, 4304)      =    163.34
       Model |  793474.797         3  264491.599   Prob > F        =    0.0000
    Residual |  6969151.29     4,304   1619.2266   R-squared       =    0.1022
-------------+----------------------------------   Adj R-squared   =    0.1016
       Total |  7762626.08     4,307  1802.32786   Root MSE        =     40.24

------------------------------------------------------------------------------
         abs |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      dssr1  |   24.94797   1.648215    15.14   0.000     21.71662    28.17933
      dssr2  |   13.83835   1.673241     8.27   0.000     10.55793    17.11876
      dssr3  |   8.870042   1.515782     5.85   0.000     5.898329    11.84176
      _cons  |   31.97143   .7562375    42.28   0.000     30.48882    33.45405
------------------------------------------------------------------------------

But I don't know how to interpret the R^2 in this case. If these variables are truly multicolinear, is the R^2 overestimated? I've tried to find some references but was unable to, and I thought maybe you guys could help me.

Thanks,
Jonas