I am analyzing a balanced panel of around 2400 firms over 12 years (Stata 13). The output I am able to present here is based on test data, as I am not allowed (or able to) extract the original files. The only difference is the number of firms, which is higher in the original dataset, and that most of my explanatory variables turn out to be significant, unlike in this sample data. F-statistic in the original is F(11,13432) Prob>F 0.0000, R-sq. overall is 0.9639.
My goal is to analyze the effect of investments in computer (investict), product and process innovations on the demand for highskilled workers. Controls include the size of the firm in terms of employees (total), the industry, a dummy for West Germany (west), a dummy for a collective bargaining agreement (collective), the state of the art of production equipment (tech) and if the firm deals with RnD, and some more.
I have used xtserial and xttest3 which have lead me to include clustered robust standard errors. Using xtoverid,made me decide to use fixed effects. -testparm- has made me include year fixed effects. So my regression is now:
Code:
xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd t > ech i.year, fe vce(cluster idnum) note: west omitted because of collinearity Fixed-effects (within) regression Number of obs = 4344 Group variable: idnum Number of groups = 498 R-sq: within = 0.1005 Obs per group: min = 1 between = 0.5034 avg = 8.7 overall = 0.4393 max = 11 F(21,497) = 2.60 corr(u_i, Xb) = 0.3892 Prob > F = 0.0001 (Std. Err. adjusted for 498 clusters in idnum) ------------------------------------------------------------------------------ | Robust highskill | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- investict | .7032893 .2711382 2.59 0.010 .170571 1.236008 product_inno | .2723859 .6988765 0.39 0.697 -1.100731 1.645503 process_inno | -.3938082 .4501978 -0.87 0.382 -1.278334 .4907173 total | .101938 .0245108 4.16 0.000 .0537805 .1500954 west | 0 (omitted) industry | .1624997 .1911486 0.85 0.396 -.2130592 .5380586 collective | -.2838042 .5861356 -0.48 0.628 -1.435413 .8678049 exportshare | .8483747 2.351452 0.36 0.718 -3.771638 5.468387 investment | 1.44e-06 5.98e-07 2.41 0.016 2.68e-07 2.62e-06 turnover | -1.99e-07 1.39e-07 -1.43 0.153 -4.73e-07 7.46e-08 rnd | -1.103514 .9824249 -1.12 0.262 -3.033732 .8267042 tech | -.6756037 .2828397 -2.39 0.017 -1.231313 -.1198947 | year | 2008 | .0310991 .3815399 0.08 0.935 -.7185309 .7807291 2009 | .4981931 .3197414 1.56 0.120 -.1300184 1.126405 2010 | .7890588 .4913133 1.61 0.109 -.1762483 1.754366 2011 | 1.109093 .5630923 1.97 0.049 .0027585 2.215428 2012 | 1.189345 .5407669 2.20 0.028 .126874 2.251816 2013 | .0965383 .7094676 0.14 0.892 -1.297387 1.490464 2014 | .4120097 .6609871 0.62 0.533 -.8866637 1.710683 2015 | -.1867301 .7267681 -0.26 0.797 -1.614647 1.241187 2016 | .1137137 .5447759 0.21 0.835 -.956634 1.184061 2017 | -.4267298 .7349041 -0.58 0.562 -1.870632 1.017172 | _cons | 4.706464 2.350515 2.00 0.046 .0882924 9.324636 -------------+---------------------------------------------------------------- sigma_u | 22.632204 sigma_e | 7.5596268 rho | .89962854 (fraction of variance due to u_i) ------------------------------------------------------------------------------
I originally intended to use the share of highskilled employees as my dependent variable, but after reading the paper of Kronman (1993) and several posts in this forum concerning the problems with ratios, I have switched to using the absolute number of highskilled employees (highskill) and include the total number of employees as a control. This has increased my R-squared by a lot (it was only 0.016 before).
On the other hand, I tested my model specification using:
Code:
predict fitted, xb g sq_fitted=fitted^2 xtreg highskill fitted sq_fitted test sq_fitted
Also I don't understand why the dummy for west would be omitted, none of the regressors are highly correlated.
I have read many posts in this forum and run several tests that made me end up with this fixed effects regression model, so I am confused about the result of the specification test. I have also tried -areg-, absorb(idnum) vce(cluster idnum), which has slightly different coefficients and a higher R-Sq. (as is normal) than the -xtreg, fe- but it has the same result in the misspecification test.
Testing for normality using
Code:
xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd tech, re vce(cluster idnum)
Code:
xtsktest (running _xtsktest_calculations on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 Tests for skewness and kurtosis Number of obs = 4344 Replications = 50 (Replications based on 498 clusters in idnum) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Skewness_e | -1805.438 1230.613 -1.47 0.142 -4217.396 606.5195 Kurtosis_e | 456552.4 194447.7 2.35 0.019 75441.97 837662.8 Skewness_u | 12182.3 2960.393 4.12 0.000 6380.038 17984.56 Kurtosis_u | 1510700 274557.2 5.50 0.000 972577.4 2048822 ------------------------------------------------------------------------------ Joint test for Normality on e: chi2(2) = 7.67 Prob > chi2 = 0.0217 Joint test for Normality on u: chi2(2) = 47.21 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------
I appreciate any input on my issues, thanks in advance,
Helen
0 Response to Importance of misspecification test vs. R-sq. and consequences of xtsktest
Post a Comment