I apologize in advance for the length of this post, I hope it will receive an answer anyway. I am working on panel data and I am looking for suggestions on which diagnostic tests I should (and should not) perform to correctly specify my model. More particularly I would like to ask:
1. Do you think I left out some relevant diagnostic test?
2. Do you think I should mention these tests and their results in writing the paper?
My panel counts 104 cross-sectional units (countries) and 15 time periods (months), the dependent variable is new monthly deaths due to COVID-19 per million habitants.
I have carried out the following tests:
1. Test for multicollinearity (variance inflation factors (VIFs) for the independent variables): the highest VIF I got is 9.54 which is lower than the 10.00 threshold suggested from the literature (Hair et al., 1995 "Multivariate Data Analysis (3rd ed)"), hence, I do exclude control variables from my model.
2. Wooldridge (2002) test for serial correlation (autocorrelation): xtserial reports a Prob > F = 0.0000 and xtistest reject the H0 till the second lag.
3. Test for Heteroskedasticity: a plot of the residuals suggest the presence of heteroskedasticity, which is also confirmed by the tests White (1980) test (estat imtest, white), by the test proposed in the Stata FAQ.
4. Test for cross-sectional dependence: the Pesaran (2015) test for weak cross-sectional dependence (xtcd2 residual,) reported a p-value = 0.000, suggesting the presence of cross-sectional dependence, which is confirmed also by the p-value = 0.000 of the test (xtcdf dependent_variable).
5. Test whether to use Pooled OLS or panel (RE): this test was performed by the post-estimation command xttest0 which reported chibar2(01) = 201.18 and Prob > chibar2 = 0.0000, suggesting a panel-wise effect.
6. Test whether to employ a RE model or FE model: I performed both the Mundlak test and the Hansen-Sargan test (xtoverid), which suggested a FE model
7. Test the inclusion of Time-Fixed Effects (with testparm): should include them in the model
8. Test the inclusion of squared terms: I included squared terms in the model and tested u-relationship with the command utest
9. Ramsey's RESET test: yielded Prob > F = 0.4581, suggesting that the model does not suffer from omitted variable bias and misspecification.
Also this is the regression with my final model:
Code:
. xtscc new_deaths_per_million dt2-dt15 month2-month15 new_tests_per_thousand people_va > ccinated_ph people_vaccinatedsquared population population_density median_age cardiov > asc_death_rate diabetes_prevalence hospital_beds_per_thousand life_expectancy gdp_per > _capita health_exp_percap urbanization_share internet_users air_passengers smokers_sh > are, fe lag(2) Regression with Driscoll-Kraay standard errors Number of obs = 1362 Method: Fixed-effects regression Number of groups = 103 Group variable (i): n_country F( 44, 102) = 8.67e+08 maximum lag: 2 Prob > F = 0.0000 within R-squared = 0.2814 -------------------------------------------------------------------------------------- | Drisc/Kraay new_deaths_per_mil~n | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------------+---------------------------------------------------------------- dt2 | .3138264 .001512 207.55 0.000 .3108273 .3168255 dt3 | 1.584465 .003592 441.11 0.000 1.577341 1.59159 dt4 | .5226887 .0661408 7.90 0.000 .3914987 .6538786 dt5 | .2736537 .0660774 4.14 0.000 .1425894 .404718 dt6 | .32476 .0026076 124.55 0.000 .3195879 .3299321 dt7 | .4208554 .0035925 117.15 0.000 .4137297 .4279811 dt8 | .3413074 .0080331 42.49 0.000 .3253738 .3572411 dt9 | .527191 .0622785 8.47 0.000 .4036619 .6507201 dt10 | .2857736 .0656563 4.35 0.000 .1555447 .4160025 dt11 | -.0484925 .075562 -0.64 0.522 -.1983693 .1013844 dt12 | -.1315471 .0793195 -1.66 0.100 -.288877 .0257827 dt13 | -.5313282 .3175867 -1.67 0.097 -1.16126 .0986035 dt14 | -1.191948 .0840271 -14.19 0.000 -1.358615 -1.02528 dt15 | -1.323201 .3152522 -4.20 0.000 -1.948503 -.6979001 month2 | .259636 .0031567 82.25 0.000 .2533746 .2658974 month3 | 1.127022 .0129004 87.36 0.000 1.101435 1.15261 month4 | .5801031 .0172902 33.55 0.000 .5458081 .6143981 month5 | .5247511 .0210133 24.97 0.000 .4830712 .5664309 month6 | .7252116 .0296244 24.48 0.000 .6664517 .7839714 month7 | .7215047 .0352729 20.45 0.000 .651541 .7914684 month8 | .7877615 .0463446 17.00 0.000 .6958371 .8796858 month9 | 1.13178 .0533886 21.20 0.000 1.025884 1.237676 month10 | 3.049781 .0671831 45.40 0.000 2.916524 3.183038 month11 | 3.69505 .0825455 44.76 0.000 3.531322 3.858779 month12 | 3.458879 .1158383 29.86 0.000 3.229115 3.688644 month13 | 2.350512 .1901172 12.36 0.000 1.973416 2.727609 month14 | 2.002837 .3156656 6.34 0.000 1.376716 2.628959 month15 | 2.498445 .3994487 6.25 0.000 1.70614 3.290749 new_tests_per_thou~d | .1133193 .0338773 3.34 0.001 .0461237 .1805148 people_vaccinated_ph | .1089296 .0475717 2.29 0.024 .0145712 .2032879 people_vaccinateds~d | -.0022201 .0008529 -2.60 0.011 -.0039119 -.0005283 population | 0 (omitted) population_density | 0 (omitted) median_age | 0 (omitted) cardiovasc_death_r~e | 0 (omitted) diabetes_prevalence | 0 (omitted) hospital_beds_per_~d | 0 (omitted) life_expectancy | 0 (omitted) gdp_per_capita | -2.50e-07 4.50e-07 -0.56 0.580 -1.14e-06 6.42e-07 health_exp_percap | 0 (omitted) urbanization_share | 0 (omitted) internet_users | 0 (omitted) air_passengers | 0 (omitted) smokers_share | 0 (omitted) _cons | 0 (omitted) --------------------------------------------------------------------------------------
Thanks in advance to whoever is willing to help
I wish you a nice weekend
0 Response to Diagnostic tests in panel data
Post a Comment