Hi members,

I have a large set of panel data with information about 166 bonds, containing some of their characteristics (such as currency, issue date, etc.) followed by daily yield data over a five year period for each bond (although most of these values are missing). I have about 54 000 observations of bond yields.

My goal is to run a regression that shows what variables have an effect on the bond's yield. More specifically, I am trying to run a regression of the yield on a measure of the bond's liquidity to find the unobserved effect that isn't explained by the variable liquidity.

So far I have run various tests to check whether I should use a fixed or random effects model, as well as tests to check for autocorrelation and heteroskedasticity, as well as an F-test. I am not sure if I am interpreting the results of these tests correctly and what my model choice should be going forward to perform regressions in stata.

The regression I am trying to perform is: (Y is yield, P_i is the fixed-effect estimator, Liquidity is the variable for Liquidity). Yield has variable name YIELDDIFF and Liquidity is BIDASKSP

Y_i,t = P_i+Liquidity_i,t +e_i,t with e being the error term.

I have first run an F-test, with the following result:
Code:
xtreg YIELDDIFF BIDASKSP, fe

Fixed-effects (within) regression               Number of obs     =     44,751
Group variable: RIC_2                           Number of groups  =        166

R-sq:                                           Obs per group:
     within  = 0.0485                                         min =         13
     between = 0.0059                                         avg =      269.6
     overall = 0.0504                                         max =      1,178

                                                F(1,44584)        =    2272.59
corr(u_i, Xb)  = 0.0180                         Prob > F          =     0.0000

------------------------------------------------------------------------------
   YIELDDIFF |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    BIDASKSP |  -.2839549   .0059565   -47.67   0.000    -.2956296   -.2722801
       _cons |   .0158317    .000306    51.74   0.000      .015232    .0164315
-------------+----------------------------------------------------------------
     sigma_u |  .15915535
     sigma_e |  .06431106
         rho |   .8596394   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(165, 44584) = 1130.58               Prob > F = 0.0000
1. Am I interpreting this test correctly as saying that my fixed-effect estimator has an explanatory value for the yield of the bonds, as given by u_i=0: F(165, 44584) = 1130.58 - and that I should in fact use a fixed effect model? What does the large F value mean?

2. Is it correct to run xtreg with YIELDDIFF (i.e. yield) as the dependent variable and only BIDASKSP (Liquidity) as the independent variable to isolate the fixed-effect estimator Pi (as outlined in the equation above)?

I then ran the Hausman test which I understand as indicating that I should be using a fixed effect rather than random effect model:
Code:
Test:  Ho:  difference in coefficients not systematic

                  chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =        3.95
                Prob>chi2 =      0.0468
Following that, I tested for Autocorrelation using a Wooldridge test, with the following result:
Code:
xtserial YIELDDIFF BIDASKSP

Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
    F(  1,     165) =     19.676
           Prob > F =      0.0000
I also performed a Modified Wald test for heteroskedasticity
Code:
xttest3

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (166)  =   1.1e+09
Prob>chi2 =      0.0000
3. Am I correct in interpreting these results as having both autocorrelation as well as heteroskedasticity in my data?

As for going forward, my understanding is that I should be doing a regression with robust standard errors (After skimming through previous research, it seems as if many regressions are performed with White standard errors, but I am unsure what this entails and how to do that in Stata). Would I then run the same xtreg as I did before but also adding robust standard errors?

Many thanks for your help. It has been a while since I took statistics at my university and I am unfortunately not entirely up to speed on my statistical knowledge.

/N