Hello!
This is my first post on Statalist, and I am very grateful for any comments and suggestions.

I am doing research on the spillover effects of foreign direct investment (FDI) on total factor productivity (TFP) using firm-level data and feel a little confused about models and data processing. To be clearer, I would like to describe briefly the data and method as follows.

Data: I construct a firm-level balanced panel data by dropping all firms of which data is unavailable in any year during the period (7 years).

Method:
- Stage 1: To use TFP as the dependent variable in the main model, I calculate TFP using standard Cobb Douglas production function [ATTACH=CONFIG]temp_14505_1558007489839_304[/ATTACH](firm i in industry j of year t), of which α and β are estimated by running OLS regression of Y on capital and labor (data of input materials is unavailable).

- Stage 2: I use FE model with estimated TFP as the dependent variable for the main estimation. Independent variables include Horizontal, Backward, Forward linkages with FDI firms; and other control variables: labor quality, firm scale, market concentration, technology gap.

The results for control variables are consistent with other papers, but for the 3 spillover variables, the signs of coefficients are inconsistent (but significant at 1% level) with most previous studies (of different countries or different period). As far as I know, there is no paper using the same period data of my country, so I cannot compare the results in the most ideal way. I feel confused and lack confidence in my results because of the endogeneity problem.

Endogeneity problems: Previous studies argue that there are problems of endogeneity because firms may decide to use how much labor and capital based on the sales of the last year. Also, foreign firms may choose to invest in industries with higher productivity, … To deal with that, some could find instrumental variables and use the first different or 2SLS or GMM model, others use lagged variables as IV. There is also a paper using lagged independent variables in a fixed effect model.

Since I cannot find a good IV, I use fixed effect model with year dummies for the sample of domestic firms only to reduce the severity of endogeneity.

And, my questions are as follows:

1. Is fixed effect model a good choice in this case? Since it does not solve the endogeneity problem, can the result become severely biased? (for example, not only change the magnitude but also the sign of the coefficients).

2. If I want to use fixed effect model: What are the pros and cons in this case? Are there any other ways to deal with endogeneity? What should I do for robustness checks to convince that my results are reliable? (The results using FE and RE are consistent, and Hausman test show that FE is better, so I choose FE).

3. Besides, should I use balanced or unbalanced data? (Many observations (at least 30%) were deleted when I constructed the balanced one, however, the sample size is still more than 100.000 observations for 7 years).

Thank you very much for your time!
If possible, please suggest me any other models that you think suitable for my data or which books/papers should I read.

Array



Array