Situation:
I use Stata 14.2. I want to investigate the effects of mobile phone penetration (mobile_p100) on Human Development Index (hdi, an index ranging from 0 to 1). I am using an unbalanced panel data set of N=120 and T=10. As my main model, I will use the GMM estimator. Following related research, I additionally want to use the least-squares dummy variable (LSDV) estimator including the lagged dependent variable and country and time fixed effects for comparison purposes.
Problem:
- Starting with “regress y1 x1…xn i.year, robust” R^2 is at 0.6 which is reasonable
- Modifying “regress y1 L.y1 x1…xn i.year i.id, robust” R^2 reaches levels higher than 0.99. This happens also when I only add one of the modifications, either the lagged variable (L.y1) or the country fixed effects (i.id)
- Using “xtreg y1 x1 …xn i.year, fe robust” provides a R^2 of 0.80. As soon as I add the lagged dependent variable, R^2 reaches >0.99.
Solution tried:
- Related literature: It is not uncommon to report R^2 of around 0.80 in this field of research, but for a R^2 > 0.99 there is no justification.
- Dataset: I double checked the observations included in the dataset and did not find any irregularities (duplications, unrealistic values etc.)
- Excluding independent variables: I excluded independent variables each at a time and ran the model again. Even when only one independent variable is left in the model, R^2 stays at around 0.99
- Spurious regression: I suppose this is not the cause of the inflated R^2 since I do not have a problem with multicollinearity nor with exceptional high t-values
- Detrend dependent variable: Helps to decrease R^2 to a reasonable level, but the results differ completely from my GMM estimation and previous research on the topic at hand.
- Multicollinearity: Does not seem to be a problem in my model, since VIF is at maximum 2.40
Questions
1) Is it possible that there is a general problem with the dependent variable which could also distort the GMM results?
2) Do you see any possibilities to overcome the problem described?
Model: With country fixed effects and lagged dependent variable
Code:
regress hdi L.hdi mobile_p100 mobile_gdp gdp_pc_growth gfcf_share fdi_share pop_growth i.year i. > id, robust Linear regression Number of obs = 1,177 F(135, 1041) = 44627.47 Prob > F = 0.0000 R-squared = 0.9996 Root MSE = .0032 ------------------------------------------------------------------------------- | Robust hdi | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- hdi | L1. | .8331407 .0175466 47.48 0.000 .79871 .8675714 | mobile_p100 | .0000201 7.99e-06 2.52 0.012 4.43e-06 .0000358 mobile_gdp | -.0000426 .0000288 -1.48 0.138 -.0000991 .0000138 gdp_pc_growth | .0004162 .0000522 7.98 0.000 .0003139 .0005185 gfcf_share | .0000549 .0000294 1.87 0.062 -2.79e-06 .0001126 fdi_share | .0000116 9.64e-06 1.21 0.228 -7.29e-06 .0000305 pop_growth | .0001226 .0002455 0.50 0.618 -.0003591 .0006044 | year | 2010 | .000235 .0005479 0.43 0.668 -.0008401 .0013101 2011 | .0013773 .0005606 2.46 0.014 .0002772 .0024773 2012 | .0018995 .0005998 3.17 0.002 .0007226 .0030764 2013 | .0033575 .0006629 5.06 0.000 .0020567 .0046584 2014 | .0031978 .0007293 4.38 0.000 .0017668 .0046288 2015 | .0033857 .0007334 4.62 0.000 .0019466 .0048248 2016 | .0037577 .0007513 5.00 0.000 .0022834 .0052319 2017 | .003713 .0008163 4.55 0.000 .0021112 .0053149 2018 | .0033513 .0008486 3.95 0.000 .0016862 .0050165 | id | ALB | .0339267 .0046612 7.28 0.000 .0247802 .0430732 [...all 120 countries...] ZMB | .0005514 .001493 0.37 0.712 -.0023783 .0034811 | _cons | .093336 .0089645 10.41 0.000 .0757455 .1109264 -------------------------------------------------------------------------------
Model: Without country fixed effects and lagged dependent variable
Code:
regress hdi mobile_p100 mobile_gdp gdp_pc_growth gfcf_share fdi_share pop_growth i.year, robust Linear regression Number of obs = 1,295 F(16, 1278) = 107.59 Prob > F = 0.0000 R-squared = 0.6149 Root MSE = .08985 ------------------------------------------------------------------------------- | Robust hdi | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- mobile_p100 | .0015012 .0000934 16.07 0.000 .0013179 .0016844 mobile_gdp | -.0065454 .0006642 -9.85 0.000 -.0078484 -.0052424 gdp_pc_growth | -.0069348 .0010797 -6.42 0.000 -.009053 -.0048166 gfcf_share | -.0006863 .0003695 -1.86 0.064 -.0014112 .0000386 fdi_share | .0003447 .0001267 2.72 0.007 .0000962 .0005932 pop_growth | -.0241198 .0023824 -10.12 0.000 -.0287936 -.0194459 | year | 2009 | -.0512804 .0121719 -4.21 0.000 -.0751596 -.0274012 2010 | -.0304352 .0109545 -2.78 0.006 -.051926 -.0089443 2011 | -.0422082 .0108261 -3.90 0.000 -.0634471 -.0209694 2012 | -.0546703 .0110025 -4.97 0.000 -.0762553 -.0330853 2013 | -.0529743 .0114669 -4.62 0.000 -.0754703 -.0304783 2014 | -.0531553 .0116728 -4.55 0.000 -.0760553 -.0302553 2015 | -.0549756 .0122075 -4.50 0.000 -.0789246 -.0310266 2016 | -.0543252 .0120781 -4.50 0.000 -.0780203 -.03063 2017 | -.0474105 .0118148 -4.01 0.000 -.0705891 -.0242319 2018 | -.0504147 .012046 -4.19 0.000 -.0740467 -.0267827 | _cons | .709139 .016166 43.87 0.000 .6774242 .7408537 -------------------------------------------------------------------------------
Patrick
PS: A similar question was asked here: link. Unfortunately, the recommendations given there did not solve the prevalent problem.
0 Response to Unrealistic R squared in LSDV model
Post a Comment