I am analysing a balanced panel with about 2400 firms for 12 years (Stata 13). My main goal is to analyse the effect of three dummy variables which are proxies for technological innovations (investict, product_inno, process_inno) on either the amount or the share of high skilled employees. My control variables include investment (absolute numbers, many zeros), total number of employees, average wages over all employees, the export share (as share of total sales) a dummy for a collective bargaining agreement (collective), the state of the art of production equipment (tech) and if the firm deals with RnD (dummy), and some more.
Conducting several tests with the help of statalist members has led me to conclude that my data is non-normally distributed (high skilled employees, investment, total employees, export share have many smaller values) and scatter plots show a non-linear relationship between my dependent variable and the independent ones. Due to the high number of meaningful (not censored) zeros in high skilled, investment and export share, I cannot use a log transformation.
After extensive research and the before mentioned help of statalist members (C. Lazarro and J. Wooldridge, thanks a lot again) i have concluded that I am basically left with two options: using the absolute number of high skilled employees and -xtpoisson, fe vce(robust)- or using the share of high skilled people (highskill/total) and a fractional response model like the one used in Papke and Woodlridge (2008).
First Question:
The problem with the poisson model is that (besides insignificance, which might well be possible) a misspecification test of the form
Code:
xtpoisson highskill investict product_inno process_inno lntotal avwages collective exportshare investment rnd tech i.industry i.year, fe vce(robust) predict xbhat, xb g xbhatsq=xbhat^2 g xbhatcu=xbhat^3 xtpoisson highskill investict product_inno process_inno lntotal avwages collective exportshare investment rnd tech xbhatsq xbhatcu i.industry i.year, fe vce(robust) test xbhatsq xbhatcu
Second Question:
To try the fractional response model I have found the Stata code from Papke on her website and using their -glm- and -xtgee- code to my data turns out the following:
Code:
glm share_high investict product_inno process_inno total avwages collective exportshare investment rnd > tech industry i.year, fa(bin) link(probit) cluster(idnum) note: share_high has noninteger values Iteration 0: log pseudolikelihood = -1672.9517 Iteration 1: log pseudolikelihood = -1653.9288 Iteration 2: log pseudolikelihood = -1653.8772 Iteration 3: log pseudolikelihood = -1653.8772 Generalized linear models No. of obs = 5582 Optimization : ML Residual df = 5560 Scale parameter = 1 Deviance = 1564.744217 (1/df) Deviance = .2814288 Pearson = 1922.417319 (1/df) Pearson = .3457585 Variance function: V(u) = u*(1-u/1) [Binomial] Link function : g(u) = invnorm(u) [Probit] AIC = .6004576 Log pseudolikelihood = -1653.877174 BIC = -46403.06 (Std. Err. adjusted for 623 clusters in idnum) ------------------------------------------------------------------------------ | Robust share_high | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- investict | -.0319885 .0510072 -0.63 0.531 -.1319608 .0679837 product_inno | .0525227 .0703238 0.75 0.455 -.0853094 .1903547 process_inno | -.0901038 .0505448 -1.78 0.075 -.1891698 .0089622 total | .0011193 .0002498 4.48 0.000 .0006296 .001609 avwages | .0000165 .0000207 0.80 0.423 -.000024 .000057 collective | -.0788953 .0695357 -1.13 0.257 -.2151827 .0573922 exportshare | -.2787724 .1815199 -1.54 0.125 -.6345448 .0769999 investment | -1.62e-08 1.52e-08 -1.07 0.286 -4.60e-08 1.35e-08 rnd | .2635743 .0984232 2.68 0.007 .0706684 .4564803 tech | .0203529 .0391601 0.52 0.603 -.0563995 .0971053 industry | .0011611 .0112536 0.10 0.918 -.0208956 .0232178 | year | 2008 | .0228136 .0298333 0.76 0.444 -.0356587 .0812858 2009 | .0332193 .025811 1.29 0.198 -.0173693 .0838079 2010 | .0259046 .0285992 0.91 0.365 -.0301488 .0819581 2011 | .029456 .0303086 0.97 0.331 -.0299477 .0888598 2012 | .0636053 .0303213 2.10 0.036 .0041766 .1230339 2013 | .0067182 .0336056 0.20 0.842 -.0591476 .0725841 2014 | -.0376976 .0365879 -1.03 0.303 -.1094086 .0340133 2015 | -.0118872 .0332579 -0.36 0.721 -.0770715 .0532971 2016 | -.0473058 .040181 -1.18 0.239 -.1260592 .0314476 2017 | -.0099275 .040348 -0.25 0.806 -.0890082 .0691532 | _cons | -1.306921 .1410517 -9.27 0.000 -1.583377 -1.030465 ------------------------------------------------------------------------------ . mat b = e(b) . xtgee share_high investict product_inno process_inno total avwages collective exportshare investment r > nd tech industry i.year, fa(bi) link(probit) corr(exch) robust from(b,skip) Iteration 1: tolerance = .80279569 Iteration 2: tolerance = .10893995 .... GEE population-averaged model Number of obs = 5582 Group variable: idnum Number of groups = 623 Link: probit Obs per group: min = 1 Family: binomial avg = 9.0 Correlation: exchangeable max = 11 Wald chi2(21) = 35.94 Scale parameter: 1 Prob > chi2 = 0.0222 (Std. Err. adjusted for clustering on idnum) ------------------------------------------------------------------------------ | Semirobust share_high | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- investict | .0188805 .010901 1.73 0.083 -.0024849 .040246 product_inno | .0286437 .0184118 1.56 0.120 -.0074427 .06473 process_inno | .0073737 .0122423 0.60 0.547 -.0166208 .0313682 total | -.0007271 .0005181 -1.40 0.160 -.0017425 .0002883 avwages | 8.26e-06 6.36e-06 1.30 0.194 -4.20e-06 .0000207 collective | -.0394644 .0214855 -1.84 0.066 -.0815753 .0026465 exportshare | -.0250987 .0709382 -0.35 0.723 -.164135 .1139377 investment | 7.70e-09 4.54e-09 1.70 0.090 -1.19e-09 1.66e-08 rnd | -.0133706 .0272105 -0.49 0.623 -.0667022 .0399611 tech | -.0206219 .0099042 -2.08 0.037 -.0400337 -.0012101 industry | -.0033315 .009638 -0.35 0.730 -.0222216 .0155586 | year | 2008 | .0196429 .0173751 1.13 0.258 -.0144117 .0536975 2009 | .0229043 .0164129 1.40 0.163 -.0092645 .055073 2010 | .0192822 .0175307 1.10 0.271 -.0150773 .0536418 2011 | .0157518 .0203442 0.77 0.439 -.0241221 .0556257 2012 | .0207046 .0195638 1.06 0.290 -.0176397 .0590489 2013 | -.000057 .0229038 -0.00 0.998 -.0449477 .0448336 2014 | -.0184354 .0216876 -0.85 0.395 -.0609424 .0240715 2015 | -.0222338 .0219286 -1.01 0.311 -.065213 .0207455 2016 | -.0241677 .0242468 -1.00 0.319 -.0716906 .0233551 2017 | -.0246466 .0248686 -0.99 0.322 -.0733881 .024095 | _cons | -1.026956 .0853745 -12.03 0.000 -1.194287 -.8596252 ------------------------------------------------------------------------------
I would really appreciate some help in understanding the fractional model.
Literature:
Papke, L. E., and J. M. Wooldridge, “Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates,” Journal of Applied Econometrics 11 (1996), 619–632.
0 Response to fractional response with -glm- and -xtgee- vs. count model and -xtpoisson, fe?
Post a Comment