I have a cross-sectional dataset with 167 observations, on which I am trying to run an OLS regression of the greenpremium (y-variable) on rating, currency, sector, maturity and issue amount (x-variables). Where rating, currency and sector are qualitative variables that are stored as strings on STATA. Maturity and issue amounts are quantitative variables.
So, first lets consider the case of rating. Using the <tabulate> command we can observe the following output:
Code:
tabulate rating
Rating | Freq. Percent Cum.
------------+-----------------------------------
A | 48 28.74 28.74
AA | 37 22.16 50.90
AAA | 58 34.73 85.63
BB | 1 0.60 86.23
BBB | 23 13.77 100.00
------------+-----------------------------------
Total | 167 100.00Code:
encode rating, generate(ratingdummy)
Then, because I want rating AAA to be my base/reference modality, I execute the following command:
Code:
fvset base 3 rating dummy
Code:
regress greenpremium 1.ratingdummy 2.ratingdummy 5.ratingdummy
Code:
regress greenpremium 1.ratingdummy 2.ratingdummy 5.ratingdummy
Source | SS df MS Number of obs = 167
-------------+---------------------------------- F(3, 163) = 1.48
Model | .027896206 3 .009298735 Prob > F = 0.2226
Residual | 1.02580512 163 .006293283 R-squared = 0.0265
-------------+---------------------------------- Adj R-squared = 0.0086
Total | 1.05370132 166 .006347598 Root MSE = .07933
------------------------------------------------------------------------------
greenpremium | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ratingdummy |
A | -.0246367 .01542 -1.60 0.112 -.0550854 .005812
AA | .0015211 .0166359 0.09 0.927 -.0313286 .0343709
BBB | -.0272488 .0195009 -1.40 0.164 -.0657559 .0112582
|
_cons | .0013823 .0103279 0.13 0.894 -.0190115 .021776
------------------------------------------------------------------------------I tested this further, and now let's also exclude rating A (48 observations) as well as BB (1 observation). Thus, we get the following output:
Code:
regress greenpremium 2.ratingdummy 5.ratingdummy
Source | SS df MS Number of obs = 167
-------------+---------------------------------- F(2, 164) = 0.93
Model | .011831414 2 .005915707 Prob > F = 0.3962
Residual | 1.04186991 164 .006352865 R-squared = 0.0112
-------------+---------------------------------- Adj R-squared = -0.0008
Total | 1.05370132 166 .006347598 Root MSE = .0797
------------------------------------------------------------------------------
greenpremium | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ratingdummy |
AA | .0125731 .015201 0.83 0.409 -.0174419 .0425881
BBB | -.0161968 .018319 -0.88 0.378 -.0523682 .0199746
|
_cons | -.0096697 .0077054 -1.25 0.211 -.0248842 .0055448
------------------------------------------------------------------------------Any ideas on why it is like this?
Best regards,
Akshil Shah
0 Response to Regression problem with categorical/dummy variables that take on more than two values
Post a Comment