I am very inexperienced with stata and I am currently trying to test a hypothesis using countries gdp and consumption data. A key part to testing my hypothesis involves using the ivreg2 command to estimate a key variable. It's my first time using this command and when I ran my first two regressions, they came back looking like I predicted. However my last two regressions using ivreg2 come back with negative centred r2 value and very large confidence intervals which was not expected and believe I have gone wrong somewhere. I am completely aware that the negative centred r2 is possible and doesn't mean that I have gone wrong. So I was just wondering if someone could possibly look at my do file and maybe see if I have gone wrong somewhere.
Using Stata/IC 16.0
// Look in the folder with the project in cd H:\ConsumptionGdp // Import the dataset, clear anything in memory import delimited "H:\ConsGdp\consumptionData.csv", clear // Install the addons needed for Instrumental variable regression ssc install ivreg2 ssc install ranktest // Sort by country and year sort country year // Removes unwanted countries keep if country== "United Kingdom" //Set year as time series data tsset year // creates variable for log of gdp gen lngdp = log(gdp) // creates variable for log of consumption gen lncons = log(consumption) // Do Dfuller test for stationarity. Both of them are rejected at 10% critical value. Rejection means that there is a unit root and so they are nonstationary. dfuller lngdp dfuller lncons // The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For GDP: gen dif1_lngdp = D.lngdp gen dif2_lngdp = D2.lngdp //Generate lag1 and lag2 of gdp gen lag1_lngdp = L.lngdp gen lag2_lngdp = L2.lngdp // Check for autocorrelation in the data, using 8 lags corrgram lngdp, lags(8) //regress dif of lngdp on lag of lngdp reg D.lngdp L.lngdp, rob //regress dif of lngdp on lag of gdp reg D.lngdp L.lngdp, rob // Create a graph of it so you can see it's stationary tsline d.lngdp // The data is nonstationary as you can see a trend. To deal with this, take first differences of random walk trend. For consumption: gen dif1_lncons = D.lncons gen dif2_lncons = D2.lncons //Generate lag1 and lag2 of consumption gen lag1_lncons = L.lncons gen lag2_lncons = L2.lncons // Check for autocorrelation in the data, using 8 lags corrgram lngdp, lags(8) //regress dif of lncons on lag of lncons reg D.lncons L.lncons, rob //regress dif of lncons on lag of consumption trend reg D.lngdp L.lngdp, rob // Create a graph of it so you can see it's stationary tsline d.lncons ** REGRESSIONS //Lagged at least twice to get rid of first-order serial correlation //Regress a simple OLS model {This has endogeneity problem},{x causes y but y may cause x} robust Need to change to log values reg consumption gdp, rob // regress y lagged from t-2 to t-4 on consumption, then on income reg d.lncons L(2/4).d.lngdp, rob reg d.lngdp L(2/4).d.lngdp, rob // regress y lagged from t-2 to t-6 on consumption, then on income reg d.lncons L(2/6).d.lngdp, rob reg d.lngdp L(2/6).d.lngdp,rob // regress y lagged from t-2 to t-4 on consumption, then on income reg d.lncons L(2/4).d.lncons, rob reg d.lngdp L(2/4).d.lncons, rob // regress c lagged from t-2 to t-6 on consumption, then on income reg d.lncons L(2/6).d.lncons, rob reg d.lngdp L(2/6).d.lncons, rob //Using ivreg2 to estimate the lambda values as the error term may be correlated with the change in income so OLS cannot be used ivreg2 cons (gdp = L(2/4)gdp), rob // ivreg2 d.lncons (d.lngdp = L(2/4).d.lngdp), rob // ivreg2 d.lncons (d.lngdp = L(2/6).d.lngdp), rob // ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob // ivreg2 d.lngdp (d.lncons = L(2/6).d.lncons),rob
. ivreg2 d.lngdp (d.lncons = L(2/4).d.lncons),rob IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity Number of obs = 44 F( 1, 42) = 0.11 Prob > F = 0.7387 Total (centered) SS = .0160808974 Centered R2 = -0.4563 Total (uncentered) SS = .0367406497 Uncentered R2 = 0.3626 Residual SS = .0234181371 Root MSE = .02307 ------------------------------------------------------------------------------ | Robust D.lngdp | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lncons | D1. | -.2960821 .8615568 -0.34 0.731 -1.984702 1.392538 | _cons | .0283756 .0193331 1.47 0.142 -.0095165 .0662678 ------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 1.971 Chi-sq(3) P-val = 0.5785 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 0.831 (Kleibergen-Paap rk Wald F statistic): 0.636 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 13.91 10% maximal IV relative bias 9.08 20% maximal IV relative bias 6.46 30% maximal IV relative bias 5.39 10% maximal IV size 22.30 15% maximal IV size 12.83 20% maximal IV size 9.54 25% maximal IV size 7.80 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 0.866 Chi-sq(2) P-val = 0.6486 ------------------------------------------------------------------------------ Instrumented: D.lncons Excluded instruments: L2D.lncons L3D.lncons L4D.lncons ------------------------------------------------------------------------------
0 Response to Ivreg2 giving negative centred r2 values and very large confidence intervals
Post a Comment