I am analysing a balanced panel with about 2400 firms for 12 years (Stata 13). My main goal is to analyse the effect of three dummy variables which are proxies for technological innovations (investict, product_inno, process_inno) on either the amount or the share of high skilled employees. My control variables include investment (absolute numbers, many zeros), total number of employees, average wages over all employees, the export share (as share of total sales) a dummy for a collective bargaining agreement (collective), the state of the art of production equipment (tech) and if the firm deals with RnD (dummy), and some more.
Conducting several tests with the help of statalist members has led me to conclude that my data is non-normally distributed (high skilled employees, investment, total employees, export share have many smaller values) and scatter plots show a non-linear relationship between my dependent variable and the independent ones. Due to the high number of meaningful (not censored) zeros in high skilled, investment and export share, I cannot use a log transformation.
After extensive research and the before mentioned help of statalist members (C. Lazarro and J. Wooldridge, thanks a lot again) i have concluded that I am basically left with two options: using the absolute number of high skilled employees and -xtpoisson, fe vce(robust)- or using the share of high skilled people (highskill/total) and a fractional response model like the one used in Papke and Woodlridge (2008).
First Question:
The problem with the poisson model is that (besides insignificance, which might well be possible) a misspecification test of the form
Code:
xtpoisson highskill investict product_inno process_inno lntotal avwages collective exportshare investment rnd tech i.industry i.year, fe vce(robust) predict xbhat, xb g xbhatsq=xbhat^2 g xbhatcu=xbhat^3 xtpoisson highskill investict product_inno process_inno lntotal avwages collective exportshare investment rnd tech xbhatsq xbhatcu i.industry i.year, fe vce(robust) test xbhatsq xbhatcu
Second Question:
To try the fractional response model I have found the Stata code from Papke on her website and using their -glm- and -xtgee- code to my data turns out the following:
Code:
glm share_high investict product_inno process_inno total avwages collective exportshare investment rnd
> tech industry i.year, fa(bin) link(probit) cluster(idnum)
note: share_high has noninteger values
Iteration 0: log pseudolikelihood = -1672.9517
Iteration 1: log pseudolikelihood = -1653.9288
Iteration 2: log pseudolikelihood = -1653.8772
Iteration 3: log pseudolikelihood = -1653.8772
Generalized linear models No. of obs = 5582
Optimization : ML Residual df = 5560
Scale parameter = 1
Deviance = 1564.744217 (1/df) Deviance = .2814288
Pearson = 1922.417319 (1/df) Pearson = .3457585
Variance function: V(u) = u*(1-u/1) [Binomial]
Link function : g(u) = invnorm(u) [Probit]
AIC = .6004576
Log pseudolikelihood = -1653.877174 BIC = -46403.06
(Std. Err. adjusted for 623 clusters in idnum)
------------------------------------------------------------------------------
| Robust
share_high | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
investict | -.0319885 .0510072 -0.63 0.531 -.1319608 .0679837
product_inno | .0525227 .0703238 0.75 0.455 -.0853094 .1903547
process_inno | -.0901038 .0505448 -1.78 0.075 -.1891698 .0089622
total | .0011193 .0002498 4.48 0.000 .0006296 .001609
avwages | .0000165 .0000207 0.80 0.423 -.000024 .000057
collective | -.0788953 .0695357 -1.13 0.257 -.2151827 .0573922
exportshare | -.2787724 .1815199 -1.54 0.125 -.6345448 .0769999
investment | -1.62e-08 1.52e-08 -1.07 0.286 -4.60e-08 1.35e-08
rnd | .2635743 .0984232 2.68 0.007 .0706684 .4564803
tech | .0203529 .0391601 0.52 0.603 -.0563995 .0971053
industry | .0011611 .0112536 0.10 0.918 -.0208956 .0232178
|
year |
2008 | .0228136 .0298333 0.76 0.444 -.0356587 .0812858
2009 | .0332193 .025811 1.29 0.198 -.0173693 .0838079
2010 | .0259046 .0285992 0.91 0.365 -.0301488 .0819581
2011 | .029456 .0303086 0.97 0.331 -.0299477 .0888598
2012 | .0636053 .0303213 2.10 0.036 .0041766 .1230339
2013 | .0067182 .0336056 0.20 0.842 -.0591476 .0725841
2014 | -.0376976 .0365879 -1.03 0.303 -.1094086 .0340133
2015 | -.0118872 .0332579 -0.36 0.721 -.0770715 .0532971
2016 | -.0473058 .040181 -1.18 0.239 -.1260592 .0314476
2017 | -.0099275 .040348 -0.25 0.806 -.0890082 .0691532
|
_cons | -1.306921 .1410517 -9.27 0.000 -1.583377 -1.030465
------------------------------------------------------------------------------
. mat b = e(b)
. xtgee share_high investict product_inno process_inno total avwages collective exportshare investment r
> nd tech industry i.year, fa(bi) link(probit) corr(exch) robust from(b,skip)
Iteration 1: tolerance = .80279569
Iteration 2: tolerance = .10893995
....
GEE population-averaged model Number of obs = 5582
Group variable: idnum Number of groups = 623
Link: probit Obs per group: min = 1
Family: binomial avg = 9.0
Correlation: exchangeable max = 11
Wald chi2(21) = 35.94
Scale parameter: 1 Prob > chi2 = 0.0222
(Std. Err. adjusted for clustering on idnum)
------------------------------------------------------------------------------
| Semirobust
share_high | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
investict | .0188805 .010901 1.73 0.083 -.0024849 .040246
product_inno | .0286437 .0184118 1.56 0.120 -.0074427 .06473
process_inno | .0073737 .0122423 0.60 0.547 -.0166208 .0313682
total | -.0007271 .0005181 -1.40 0.160 -.0017425 .0002883
avwages | 8.26e-06 6.36e-06 1.30 0.194 -4.20e-06 .0000207
collective | -.0394644 .0214855 -1.84 0.066 -.0815753 .0026465
exportshare | -.0250987 .0709382 -0.35 0.723 -.164135 .1139377
investment | 7.70e-09 4.54e-09 1.70 0.090 -1.19e-09 1.66e-08
rnd | -.0133706 .0272105 -0.49 0.623 -.0667022 .0399611
tech | -.0206219 .0099042 -2.08 0.037 -.0400337 -.0012101
industry | -.0033315 .009638 -0.35 0.730 -.0222216 .0155586
|
year |
2008 | .0196429 .0173751 1.13 0.258 -.0144117 .0536975
2009 | .0229043 .0164129 1.40 0.163 -.0092645 .055073
2010 | .0192822 .0175307 1.10 0.271 -.0150773 .0536418
2011 | .0157518 .0203442 0.77 0.439 -.0241221 .0556257
2012 | .0207046 .0195638 1.06 0.290 -.0176397 .0590489
2013 | -.000057 .0229038 -0.00 0.998 -.0449477 .0448336
2014 | -.0184354 .0216876 -0.85 0.395 -.0609424 .0240715
2015 | -.0222338 .0219286 -1.01 0.311 -.065213 .0207455
2016 | -.0241677 .0242468 -1.00 0.319 -.0716906 .0233551
2017 | -.0246466 .0248686 -0.99 0.322 -.0733881 .024095
|
_cons | -1.026956 .0853745 -12.03 0.000 -1.194287 -.8596252
------------------------------------------------------------------------------I would really appreciate some help in understanding the fractional model.
Literature:
Papke, L. E., and J. M. Wooldridge, “Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates,” Journal of Applied Econometrics 11 (1996), 619–632.
0 Response to fractional response with -glm- and -xtgee- vs. count model and -xtpoisson, fe?
Post a Comment