Hi,

I am making a diff-in-diff regression for the top ESG companies relative to the bottom ESG companies in times of COVID-19. And have some questions in regards to the assumptions and tests of these. Please see the following.



I made the following two regressions (first without fixed effects, and second with fixed effects):

xtreg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG != 1, vce(cluster CompanyNo)

xtreg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG!= 1, fe vce(cluster CompanyNo)

I get the following output for the first model:

[ATTACH=CONFIG]temp_22264_1619244605210_960[/ATTACH]

[ATTACH=CONFIG]temp_22268_1619244651081_341[/ATTACH]For the second model (FE) I get:


[ATTACH=CONFIG]temp_22269_1619244705288_338[/ATTACH][ATTACH=CONFIG]temp_22270_1619244728870_971[/ATTACH]

Before to test for the OLS assumptions I have done the following:


Linearity, Random Sample & Zero Conditional Mean

I run the following in Stata to test for linearity and zero conditional mean:

reg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG != 1
Why can I only do this with the reg command and not xtreg? And is that fine?

predict pred, xb replace
predict resid, resid
scatter resid pred


[ATTACH=CONFIG]temp_22272_1619245340418_697[/ATTACH]


As the above figure does not look like the examples I have seen online I wonder how to interpret the linearity and zero conditional mean from above?

Could we simply argue for the zero conditional mean that according to XX there should be no omitted variables, hence to avoid this bias, we have included relevant variables recgonized in the literature?

In regards to the random sample, as we look at S&P 500 and use all of it, I would argue that this fulfills the random sample assumption?

Multicollinearity
We test by using correlation and the VIF values.
corr RawReturn ESG_score E_score S_score G_score LN_assets Leverage Liquidity MBV ROA
vif

(VIF model could not be uploaded due to the maximum attachments)
(correlation matrix could not be uploaded due to the maximum attachments)

As we see from above no correlation is higher than 0.7 (argued in the literature that there will be some correlation but below 0.7 is fine) and the VIF is below 10 (also argued in the literature). Hence, we see no multicollinearity.

Heteraskedacity
From the plot above we can see heteroskedacity from the horizontal lines, correct? However, this can also be tested using the following in Stata:

reg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG != 1
estat hettest

(Breusch-Pagan test could not be uploaded due to the maximum attachments)
But the output was
Ho: Constant variance
Variables: fitted values of RawReturn
chi2 = 2289.25
Prob > chi2 = 0.0000

I.e., there is heteroskedasticity and we apply the robust standard errors.



So above is the assumption for the OLS, and thereby random effect model. As we also run the fixed effects model, how do these assumptions differ? As we understand the heteroskedacity would not need to be included but otherwise, it should be the same.



We run this regression model for the top ESG (as above) but we also run it for a decomponent of only the E score (environmental score). I.e., Top_E instead which will include different companies in the top (same dataset though). In addition, we run it for abnormal returns instead of raw returns.
In theory, we should run these tests for each model. Correct?



Thank you so much in advance!!! It is really appreciated.

Best,
Freja