Dear Stata experts and fellow students

We are 2 students that are writing our thesis where we try to measure the impact industrial robot density has on labor productivity. We have gotten stuck after conducting a 3sls regression since we are not sure how to interpret the result and find it hard to find proper info regarding 3sls on the regular Stata help file.

We use Stata 13 on windows 10 and have strongly balance panel data over 11 years and 16 countries. Our hypothesis is that increased robot density increases labor productivity within the industry sector. As we have understood it we should first use command xtreg on our dependent variable “Productivity”, with “Density” as independent variable together with other exogenous variables. We then conduct the Hausman test to determine whether to use fixed or random effects model. After that we made another xtreg but switched place on “Density” and “Productivity” so as “Density” now is the dependent variable, and “Productivity” an independent variable.

As expected the coefficient for Robot Density on Productivity was significant and positive. A little surprising was that Productivity also was positive and significant on Density, suggesting dual causality and that we have a simultaneous equation model. We have not encountered this problem before. As we have understood we then should use the 3sls method, with the reg3 command to work around the simultaneity bias and receive correct estimates. The problem we have is to interpret the 3sls output since it does not display coefficients for our 2 endogenous variables, it only displays coefficients for our exogenous variables. Our end goal is to isolate our main dependent variable Productivity in an equation to be able to estimate productivity given certain inputs of our independent variables.


The variables we have are as followed:

country str14 %14s Country
year int %ty Year
densities float %9.0g Robots per 10 000 workers
productivity float %9.0g Value added / Employed
capint float %9.0g Capital intensity K/Employed
wage float %9.0g Wage (industry)
human float %9.0g Human capital index
engineers float %9.0g Numbers of STEM graduates/year/
hightech float %9.0g High tech export / GDP
randd float %9.0g R&D Investments / Capita
countrynum float %9.0g group(country)


We then generate below log variables (hightech and human is not neccesary to log)
gen lnDens = ln(densities)
gen lnProd = ln(productivity)
gen lnCapint = ln(capint)
gen lnEng = ln(engineers)
gen lnRandD = ln(randd)


We then regress using xtreg and conducting the Hausman test to determine whether to use fixed or random effects model and conclude that we should use the fixed effects model since the p-value was below 0,05.

The regression results is as follows:


xtreg lnProd lnDens lnCapint lnEng lnRandD hightech human i.year, fe

Fixed-effects (within) regression Number of obs = 176
Group variable: countrynum Number of groups = 16

R-sq: within = 0.7820 Obs per group: min = 11
between = 0.6468 avg = 11.0
overall = 0.6104 max = 11

F(16,144) = 32.29
corr(u_i, Xb) = 0.5456 Prob > F = 0.0000

-----------------------------------------------------------------------------
lnProd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnDens | .0537314 .0201722 2.66 0.009 .0138595 .0936033
lnCapint | .0074656 .0789217 0.09 0.925 -.148529 .1634603
lnEng | -.0164537 .0280208 -0.59 0.558 -.0718389 .0389315
lnRandD | .183102 .0496251 3.69 0.000 .0850143 .2811897
hightech | 1.523402 .5574522 2.73 0.007 .4215562 2.625248
human | -.2051006 .2740913 -0.75 0.456 -.7468626 .3366614
|
year |
2005 | .0353189 .0200592 1.76 0.080 -.0043296 .0749674
2006 | .0761116 .0232683 3.27 0.001 .0301201 .1221031
2007 | .1245565 .0272251 4.58 0.000 .0707441 .1783689
2008 | .1026373 .0314382 3.26 0.001 .0404973 .1647774
2009 | .0339682 .0348541 0.97 0.331 -.0349235 .1028599
2010 | .1303824 .0386303 3.38 0.001 .0540267 .2067381
2011 | .1542142 .0424936 3.63 0.000 .0702224 .2382059
2012 | .1660519 .0463048 3.59 0.000 .0745269 .2575768
2013 | .1885433 .0497429 3.79 0.000 .0902228 .2868639
2014 | .2160856 .0514936 4.20 0.000 .1143046 .3178666
|
_cons | 3.218955 1.03314 3.12 0.002 1.176876 5.261033
-------------+----------------------------------------------------------------
sigma_u | .41611069
sigma_e | .05468317
rho | .9830233 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(15, 144) = 182.81 Prob > F = 0.0000






In order to conduct the 3sls regression we first define the two starting equations as the following:

Equation 1: lnDENS=lnPRODx1+lnCAPINTx2+lnENGx3+lnR&Dx4+lnHCx5+C1+Year variable

Equation 2: lnPROD=lnDENSq1+ lnENGq2+lnR&Dq3+lnHCq4+C2+Year variable

Where x1, x2 etc are the coefficients in the Density equation, and q1,q2 etc are the coefficents in the productivity equation.

The 3reg results is as follows:




. *We have to drop one variable each for identification purpose
. * We dropped hightech exports for density equation, and capital intensity for productivity equation
. global DensEquation "(qDensEquation: lnProd lnCapint lnEng lnRandD human i.year)"
. global ProdEquation "(qProdEquation: lnDens lnEng lnRandD hightech human i.year)"

. reg3 $DensEquation $ProdEquation
Three-stage least-squares regression
----------------------------------------------------------------------
Equation Obs Parms RMSE "R-sq" chi2 P
----------------------------------------------------------------------
qDensEquat~n 176 14 .2378743 0.8107 755.05 0.0000
qProdEquat~n 176 14 .5927817 0.7300 477.04 0.0000
---------------------------------------------------------------------
-------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
qDensEquation |
lnCapint | .3917138 .0823276 4.76 0.000 .2303546 .5530729
lnEng | -.2388427 .0483932 -4.94 0.000 -.3336916 -.1439938
lnRandD | .5442305 .0368356 14.77 0.000 .4720341 .6164269
human | -.0514337 .0682452 -0.75 0.451 -.1851918 .0823243
year |
2005 | .0127398 .0843589 0.15 0.880 -.1526006 .1780802
2006 | .0102551 .085419 0.12 0.904 -.157163 .1776733
2007 | .0280562 .0873754 0.32 0.748 -.1431965 .1993089
2008 | -.0407075 .0886998 -0.46 0.646 -.2145559 .133141
2009 | -.1271511 .0909384 -1.40 0.162 -.3053871 .0510849
2010 | -.0334432 .0937985 -0.36 0.721 -.2172848 .1503984
2011 | -.0110745 .096896 -0.11 0.909 -.2009871 .1788381
2012 | -.0222499 .0999603 -0.22 0.824 -.2181684 .1736686
2013 | -.0133324 .1025704 -0.13 0.897 -.2143667 .1877019
2014 | .0089705 .1031653 0.09 0.931 -.1932297 .2111707
|
_cons | -2.141644 1.205448 -1.78 0.076 -4.504279 .2209902
--------------+----------------------------------------------------------------
qProdEquation |
lnEng | .0165941 .1116169 0.15 0.882 -.2021709 .2353592
lnRandD | 1.370518 .070219 19.52 0.000 1.232892 1.508145
hightech | -4.836801 1.933309 -2.50 0.012 -8.626017 -1.047586
human | -.2231039 .1536515 -1.45 0.146 -.5242553 .0780476
|
year |
2005 | .0873715 .2096541 0.42 0.677 -.3235429 .498286
2006 | .1402849 .2100033 0.67 0.504 -.271314 .5518838
2007 | .1883111 .2106528 0.89 0.371 -.2245608 .6011831
2008 | .2204112 .2107594 1.05 0.296 -.1926697 .6334921
2009 | .3110651 .211508 1.47 0.141 -.103483 .7256131
2010 | .3916329 .2124919 1.84 0.065 -.0248437 .8081094
2011 | .4508718 .2144211 2.10 0.035 .0306142 .8711294
2012 | .4993421 .2153684 2.32 0.020 .0772278 .9214565
2013 | .5604607 .2162927 2.59 0.010 .1365348 .9843865
2014 | .6025302 .2169143 2.78 0.005 .177386 1.027674
|
_cons | -3.866199 .9997115 -3.87 0.000 -5.825598 -1.906801
-------------------------------------------------------------------------------
Endogenous variables: lnProd lnDens
Exogenous variables: lnCapint lnEng lnRandD human 2005.year 2006.year
2007.year 2008.year 2009.year 2010.year 2011.year 2012.year 2013.year
2014.year hightech
------------------------------------------------------------------------------





(note that CAPINT missing in the second equation is not an error: the variable has been automatically dropped by the program, we dont know how to intepret that).

We have proceeded by plugging the lnDENS equation into the lnPROD equation to isolate the productivity variable

The problem we have when we try to isolate lnPROD is that we dont have xq and q1 (the coefficents from LnPROD within Equation 1, and we dont have q1 (the coefficent from lnDens within equation 2). The other x and q coefficents we have from the table in the 3sls regression, but the coefficents for lnDens and lnProd is not displayed. We have tried to find the answer in lecture notes, the stata handbook etc but not found it yet.

What follows is what we’ve got:

(1-x1q1)lnPROD = lnCAPINTx2q1 + lnENGx3q1 + lnR&Dx4q1 + lnHCx5q1 + K1 + lnENGq2 + lnR&Dq3+lnHCq4+K2+Year variable

Have we made some mistake when programming the reg3 command?
Do you know how to properly isloate lnPROD so that we can calculate productivity in our forecast period?
We asked our supervisor in school (which don’t have much time to answer our question) and he shortly described the problem as follows:

“You have done correct the procedure such that the two models are identified (differ in specification). I think you have missed to add Prod to the right hand of Dens and vice versa. This explains that you do not have their causal effects estimated with SYSTEM estimation procedure which use maximum likelihood method that is iterative, if converged is better but it is based on strong assumptions compared with 3SLS. “

That answer also confused us a bit so we are hoping some friendly Stata expert can advise us how to interpret the 3slsl result and how to isolate our main dependent variable Productivity.
Thanks and best regards/ Mikael and Claudio