Should I use vce(robust) for my OLS regression?

Hi everyone, this is my first time posting, so I hope that I include all the relevant information. I am using regress in Stata17.0 for a multivariable regression, and unsure if I should include the option vce(robust). There are 61 observations used in the regression. In case this is useful, below is a sample of the dataset using dataex.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float cigar_mceq_satisfaction byte cigar_intensity double(cigar_bp cigar_omax cigar_finalpmax cigar_alpha)
2.3333333  2 1.9999999999999998                1.2                 .6  .05395
        7 10                  3                  9                 .9 .007364
 6.333333  4 1.9999999999999998                  4                  1  .01633
 5.666667  5                  5                  6                2.5 .008666
        3  1                  3 1.9999999999999998 1.9999999999999998   .0472
end

The significance of one variable--cigar_omax--changes when I use vce(robust):

. regress cigar_mceq_sat cigar_intensity cigar_bp cigar_omax cigar_finalpmax cigar_alpha

Source | SS df MS Number of obs = 61
-------------+---------------------------------- F(5, 55) = 2.31
Model | 18.5321839 5 3.70643678 Prob > F = 0.0564
Residual | 88.2802022 55 1.60509459 R-squared = 0.1735
-------------+---------------------------------- Adj R-squared = 0.0984
Total | 106.812386 60 1.78020644 Root MSE = 1.2669

---------------------------------------------------------------------------------
cigar_mceq_sa~n | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
cigar_intensity | .0043516 .01773 0.25 0.807 -.0311802 .0398834
cigar_bp | .0013295 .0462788 0.03 0.977 -.0914152 .0940743
cigar_omax | .0630785 .0338445 1.86 0.068 -.0047474 .1309043
cigar_finalpmax | .0305393 .1443719 0.21 0.833 -.2587884 .3198671
cigar_alpha | .7171226 9.763465 0.07 0.942 -18.8493 20.28354
_cons | 3.78496 .4116146 9.20 0.000 2.960066 4.609854
---------------------------------------------------------------------------------

.
. regress cigar_mceq_sat cigar_intensity cigar_bp cigar_omax cigar_finalpmax cigar_alpha, vce(r
> obust)

Linear regression Number of obs = 61
F(5, 55) = 4.64
Prob > F = 0.0013
R-squared = 0.1735
Root MSE = 1.2669

---------------------------------------------------------------------------------
| Robust
cigar_mceq_sa~n | Coefficient std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
cigar_intensity | .0043516 .0143781 0.30 0.763 -.0244629 .033166
cigar_bp | .0013295 .0516327 0.03 0.980 -.1021447 .1048038
cigar_omax | .0630785 .0182502 3.46 0.001 .0265043 .0996526
cigar_finalpmax | .0305393 .1549317 0.20 0.844 -.2799508 .3410294
cigar_alpha | .7171226 5.977279 0.12 0.905 -11.26161 12.69586
_cons | 3.78496 .3438619 11.01 0.000 3.095845 4.474074
---------------------------------------------------------------------------------

I thought that this could be an indication of heteroskedasticity. I ran estat hettest and imtest after I performed the first regression (without vce(robust)), but both tests show that I fail to reject the null hypothesis:

. estat hettest

Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of cigar_mceq_satisfaction

H0: Constant variance

chi2(1) = 0.22
Prob > chi2 = 0.6358

. estat imtest

Cameron & Trivedi's decomposition of IM-test

--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 10.29 20 0.9626
Skewness | 1.89 5 0.8642
Kurtosis | 0.11 1 0.7446
---------------------+----------------------------
Total | 12.29 26 0.9894
--------------------------------------------------

I also used rvfplot. Visually I believe there does appear to be heteroskedasticity. In particular one ID (26) is notable.

Code:

rvfplot, yline(0) mlabel(pid)

Array

If I drop that observation from the model, cigar_omax is not statistically significant, with or without vce(robust):

. regress cigar_mceq_sat cigar_intensity cigar_bp cigar_omax cigar_finalpmax cigar_alpha if pid!=26

Source | SS df MS Number of obs = 60
-------------+---------------------------------- F(5, 54) = 1.44
Model | 11.7738544 5 2.35477088 Prob > F = 0.2246
Residual | 88.2465161 54 1.63419474 R-squared = 0.1177
-------------+---------------------------------- Adj R-squared = 0.0360
Total | 100.02037 59 1.69526052 Root MSE = 1.2784

---------------------------------------------------------------------------------
cigar_mceq_sa~n | Coefficient Std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
cigar_intensity | .0047858 .0181438 0.26 0.793 -.0315904 .041162
cigar_bp | .0038414 .0498663 0.08 0.939 -.0961345 .1038173
cigar_omax | .0567864 .0555594 1.02 0.311 -.0546035 .1681762
cigar_finalpmax | .029057 .1460401 0.20 0.843 -.2637358 .3218499
cigar_alpha | .0352273 10.93667 0.00 0.997 -21.89148 21.96194
_cons | 3.825782 .5033311 7.60 0.000 2.816664 4.8349
---------------------------------------------------------------------------------

. regress cigar_mceq_sat cigar_intensity cigar_bp cigar_omax cigar_finalpmax cigar_alpha if pid!=26, vce(robust)

Linear regression Number of obs = 60
F(5, 54) = 1.68
Prob > F = 0.1543
R-squared = 0.1177
Root MSE = 1.2784

---------------------------------------------------------------------------------
| Robust
cigar_mceq_sa~n | Coefficient std. err. t P>|t| [95% conf. interval]
----------------+----------------------------------------------------------------
cigar_intensity | .0047858 .0140686 0.34 0.735 -.0234201 .0329916
cigar_bp | .0038414 .0557472 0.07 0.945 -.107925 .1156078
cigar_omax | .0567864 .0462053 1.23 0.224 -.0358497 .1494224
cigar_finalpmax | .029057 .1563093 0.19 0.853 -.2843242 .3424383
cigar_alpha | .0352273 7.080134 0.00 0.996 -14.15959 14.23004
_cons | 3.825782 .4503772 8.49 0.000 2.92283 4.728734
---------------------------------------------------------------------------------

I previously examined my data set for points of high leverage, residuals, and influence. This observation does have high leverage, but very low residuals, so it did not appear to have considerable influence. Also, when I calculated dfbetas for each predictor, ID 26 had a high influence for cigar_omax but it wasn't the observation with the highest influence.

. quietly: regress cigar_mceq_sat cigar_intensity cigar_bp cigar_omax cigar_finalpmax cigar_alpha
.
. for var cigar_intensity cigar_bp cigar_omax cigar_finalpmax cigar_alpha: predict DFn_X, dfbeta(X)

-> predict DFn_cigar_intensity, dfbeta(cigar_intensity)
(4 missing values generated)

-> predict DFn_cigar_bp, dfbeta(cigar_bp)
(4 missing values generated)

-> predict DFn_cigar_omax, dfbeta(cigar_omax)
(4 missing values generated)

-> predict DFn_cigar_finalpmax, dfbeta(cigar_finalpmax)
(4 missing values generated)

-> predict DFn_cigar_alpha, dfbeta(cigar_alpha)
(4 missing values generated)

.
. quietly lv DFn_cigar_omax

. sort DFn_cigar_omax

. list pid DFn_cigar_omax cigar_omax if ( (DFn_cigar_omax >=( r(u_F) + 1.5*(r(u_F) - r(l_F)))) | (DFn_cigar_omax <=( r(l_F) - 1.5*(r(u_F) - r(l_F)))) ) & DFn_cigar_omax!=.

+----------------------------+
| pid DFn_~omax c~r_omax |
|----------------------------|
1. | 34 -.3107229 18 |
2. | 56 -.2782602 20 |
3. | 11 -.1311674 3 |
4. | 7 -.113518 3 |
56. | 72 .1330477 15 |
|----------------------------|
57. | 12 .1347013 .5 |
58. | 62 .1422614 15 |
59. | 17 .1662419 10 |
60. | 26 .1842496 45 |
61. | 55 .2723851 20 |
+----------------------------+

. lvr2plot, mlabel(pid)

Array

So I am not sure what is a better course of action. I do not know why vce(robust) would change a statistical significance test if there was no heterskedasticity. I am also not sure why dropping that observation changes the test so much when it does not appear to have high influence.

I appreciate any thoughts or advice. Thank you!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Should I use vce(robust) for my OLS regression?
Should I use vce(robust) for my OLS regression?

0 Response to Should I use vce(robust) for my OLS regression?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Should I use vce(robust) for my OLS regression? Should I use vce(robust) for my OLS regression?

Related Posts with Should I use vce(robust) for my OLS regression?

0 Response to Should I use vce(robust) for my OLS regression?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Should I use vce(robust) for my OLS regression?
Should I use vce(robust) for my OLS regression?