I am very inexperienced with stata and I am currently trying to test a hypothesis using countries gdp and consumption data. A key part to testing my hypothesis involves using the ivreg2 command to estimate a key variable. It's my first time using this command and when I ran my first two regressions, they came back looking like I predicted. However my last two regressions using ivreg2 come back with negative centred r2 value and very large confidence intervals which was not expected and believe I have gone wrong somewhere. I am completely aware that the negative centred r2 is possible and doesn't mean that I have gone wrong. So I was just wondering if someone could possibly look at my do file and maybe see if I have gone wrong somewhere.
Using Stata/IC 16.0
Code:
// Look in the folder with the project in
cd H:\ConsumptionGdp
// Import the dataset, clear anything in memory
import delimited "H:\ConsGdp\consumptionData.csv", clear
// Install the addons needed for Instrumental variable regression
ssc install ivreg2
ssc install ranktest
// Sort by country and year
sort country year
// Removes unwanted countries
keep if country== "United Kingdom"
//Set year as time series data
tsset year
// creates variable for log of gdp
gen lngdp = log(gdp)
// creates variable for log of consumption
gen lncons = log(consumption)
// Do Dfuller test for stationarity. Both of them are rejected at 10% critical value. Rejection means that there is a unit root and so they are nonstationary.
dfuller lngdp
dfuller lncons
// The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For GDP:
gen dif1_lngdp = D.lngdp
gen dif2_lngdp = D2.lngdp
//Generate lag1 and lag2 of gdp
gen lag1_lngdp = L.lngdp
gen lag2_lngdp = L2.lngdp
// Check for autocorrelation in the data, using 8 lags
corrgram lngdp, lags(8)
//regress dif of lngdp on lag of lngdp
reg D.lngdp L.lngdp, rob
//regress dif of lngdp on lag of gdp
reg D.lngdp L.lngdp, rob
// Create a graph of it so you can see it's stationary
tsline d.lngdp
// The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For consumption:
gen dif1_lncons = D.lncons
gen dif2_lncons = D2.lncons
//Generate lag1 and lag2 of consumption
gen lag1_lncons = L.lncons
gen lag2_lncons = L2.lncons
// Check for autocorrelation in the data, using 8 lags
corrgram lngdp, lags(8)
//regress dif of lncons on lag of lncons
reg D.lncons L.lncons, rob
//regress dif of lncons on lag of consumption trend
reg D.lngdp L.lngdp, rob
// Create a graph of it so you can see it's stationary
tsline d.lncons
** REGRESSIONS
//Lagged at least twice to get rid of first-order serial correlation
//Regress a simple OLS model {This has endogeneity problem},{x causes y but y may cause x} robust Need to change to log values
reg consumption gdp, rob
// regress y lagged from t-2 to t-4 on consumption, then on income
reg d.lncons L(2/4).d.lngdp, rob
reg d.lngdp L(2/4).d.lngdp, rob
// regress y lagged from t-2 to t-6 on consumption, then on income
reg d.lncons L(2/6).d.lngdp, rob
reg d.lngdp L(2/6).d.lngdp,rob
// regress y lagged from t-2 to t-4 on consumption, then on income
reg d.lncons L(2/4).d.lncons, rob
reg d.lngdp L(2/4).d.lncons, rob
// regress c lagged from t-2 to t-6 on consumption, then on income
reg d.lncons L(2/6).d.lncons, rob
reg d.lngdp L(2/6).d.lncons, rob
//Using ivreg2 to estimate the lambda values as the error term may be correlated with the change in income so OLS cannot be used
ivreg2 cons (gdp = L(2/4)gdp), rob
//
ivreg2 d.lncons (d.lngdp = L(2/4).d.lngdp), rob
//
ivreg2 d.lncons (d.lngdp = L(2/6).d.lngdp), rob
//
ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob
//
ivreg2 d.lngdp (d.lncons = L(2/6).d.lncons),robCode:
. ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 44
F( 1, 42) = 0.11
Prob > F = 0.7387
Total (centered) SS = .0160808974 Centered R2 = -0.4563
Total (uncentered) SS = .0367406497 Uncentered R2 = 0.3626
Residual SS = .0234181371 Root MSE = .02307
------------------------------------------------------------------------------
| Robust
D.lngdp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lncons |
D1. | -.2960821 .8615568 -0.34 0.731 -1.984702 1.392538
|
_cons | .0283756 .0193331 1.47 0.142 -.0095165 .0662678
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 1.971
Chi-sq(3) P-val = 0.5785
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 0.831
(Kleibergen-Paap rk Wald F statistic): 0.636
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 13.91
10% maximal IV relative bias 9.08
20% maximal IV relative bias 6.46
30% maximal IV relative bias 5.39
10% maximal IV size 22.30
15% maximal IV size 12.83
20% maximal IV size 9.54
25% maximal IV size 7.80
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.866
Chi-sq(2) P-val = 0.6486
------------------------------------------------------------------------------
Instrumented: D.lncons
Excluded instruments: L2D.lncons L3D.lncons L4D.lncons
------------------------------------------------------------------------------
0 Response to Ivreg2 giving negative centred r2 values and very large confidence intervals
Post a Comment