I'm estimating production function using firm-level panel data (6 years) to obtain firms' productivity. To illustrate, please have a look at the following code and results:
Code:
reg log_output log_labor log_capital log_materials Source | SS df MS Number of obs = 77,674 -------------+---------------------------------- F(3, 77670) > 99999.00 Model | 232153.056 3 77384.352 Prob > F = 0.0000 Residual | 14296.2403 77,670 .184063864 R-squared = 0.9420 -------------+---------------------------------- Adj R-squared = 0.9420 Total | 246449.296 77,673 3.17290817 Root MSE = .42903 ------------------------------------------------------------------------------- log_output | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- log_labor | .2786241 .0016469 169.18 0.000 .2753962 .2818521 log_capital | .1249548 .0011086 112.72 0.000 .122782 .1271276 log_materials | .6145012 .0010075 609.93 0.000 .6125266 .6164759 _cons | 2.023493 .0077761 260.22 0.000 2.008252 2.038734 -------------------------------------------------------------------------------
I can generate a dummy variable (exit_dummy), it equals 1 if a firm survive through 6 years of panel data, equals 0 if they exit during those years.
Because exit_dummy is negatively related to all current independent variables, especially log of capital, as you can see from here:
Code:
. corr exit_dummy log_labor log_capital log_materials (obs=77,677) | exit_d~y log_la~r log_ca~l log_ma~s -------------+------------------------------------ exit_dummy | 1.0000 log_labor | -0.0668 1.0000 log_capital | -0.0918 0.6461 1.0000 log_materi~s | -0.0732 0.5564 0.6777 1.0000
Now I add exit_dummy to the regression, and this is the results:
Code:
reg log_output log_labor log_capital log_materials exit_dummy Source | SS df MS Number of obs = 77,674 -------------+---------------------------------- F(4, 77669) > 99999.00 Model | 232160.76 4 58040.1901 Prob > F = 0.0000 Residual | 14288.5358 77,669 .183967038 R-squared = 0.9420 -------------+---------------------------------- Adj R-squared = 0.9420 Total | 246449.296 77,673 3.17290817 Root MSE = .42891 ------------------------------------------------------------------------------- log_output | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- log_labor | .2785518 .0016465 169.18 0.000 .2753246 .281779 log_capital | .1246052 .0011096 112.30 0.000 .1224304 .1267799 log_materials | .6144147 .0010073 609.96 0.000 .6124404 .616389 exit_dummy | -.054435 .0084116 -6.47 0.000 -.0709217 -.0379483 _cons | 2.029817 .0078352 259.06 0.000 2.01446 2.045174 -------------------------------------------------------------------------------
Given that, I don't intend to put exit_dummy into this regression to control selection bias, I use another method which uses exit_dummy in a multi-stage regression, I also obatain the unexpted results after controlling selection bias, so I use this simple example (based on my dataset) to show the same kind of unexpected results for my case.
Anyone can please help me make sense out of this unexpected results?
Thanks a lot in advance.
0 Response to Help! Unexpected impact of controlling an omitted variable
Post a Comment