I have a cross-sectional dataset with 167 observations, on which I am trying to run an OLS regression of the greenpremium (y-variable) on rating, currency, sector, maturity and issue amount (x-variables). Where rating, currency and sector are qualitative variables that are stored as strings on STATA. Maturity and issue amounts are quantitative variables.
So, first lets consider the case of rating. Using the <tabulate> command we can observe the following output:
Code:
tabulate rating Rating | Freq. Percent Cum. ------------+----------------------------------- A | 48 28.74 28.74 AA | 37 22.16 50.90 AAA | 58 34.73 85.63 BB | 1 0.60 86.23 BBB | 23 13.77 100.00 ------------+----------------------------------- Total | 167 100.00
Code:
encode rating, generate(ratingdummy)
Then, because I want rating AAA to be my base/reference modality, I execute the following command:
Code:
fvset base 3 rating dummy
Code:
regress greenpremium 1.ratingdummy 2.ratingdummy 5.ratingdummy
Code:
regress greenpremium 1.ratingdummy 2.ratingdummy 5.ratingdummy Source | SS df MS Number of obs = 167 -------------+---------------------------------- F(3, 163) = 1.48 Model | .027896206 3 .009298735 Prob > F = 0.2226 Residual | 1.02580512 163 .006293283 R-squared = 0.0265 -------------+---------------------------------- Adj R-squared = 0.0086 Total | 1.05370132 166 .006347598 Root MSE = .07933 ------------------------------------------------------------------------------ greenpremium | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ratingdummy | A | -.0246367 .01542 -1.60 0.112 -.0550854 .005812 AA | .0015211 .0166359 0.09 0.927 -.0313286 .0343709 BBB | -.0272488 .0195009 -1.40 0.164 -.0657559 .0112582 | _cons | .0013823 .0103279 0.13 0.894 -.0190115 .021776 ------------------------------------------------------------------------------
I tested this further, and now let's also exclude rating A (48 observations) as well as BB (1 observation). Thus, we get the following output:
Code:
regress greenpremium 2.ratingdummy 5.ratingdummy Source | SS df MS Number of obs = 167 -------------+---------------------------------- F(2, 164) = 0.93 Model | .011831414 2 .005915707 Prob > F = 0.3962 Residual | 1.04186991 164 .006352865 R-squared = 0.0112 -------------+---------------------------------- Adj R-squared = -0.0008 Total | 1.05370132 166 .006347598 Root MSE = .0797 ------------------------------------------------------------------------------ greenpremium | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ratingdummy | AA | .0125731 .015201 0.83 0.409 -.0174419 .0425881 BBB | -.0161968 .018319 -0.88 0.378 -.0523682 .0199746 | _cons | -.0096697 .0077054 -1.25 0.211 -.0248842 .0055448 ------------------------------------------------------------------------------
Any ideas on why it is like this?
Best regards,
Akshil Shah
0 Response to Regression problem with categorical/dummy variables that take on more than two values
Post a Comment