I have a panel data where my independent variables are highly COLLINEAR(Index1 to Index4). In that case, rather than dropping one or more of the collinear variables, is it legitimate to transform the variables so that we can retain them. I will demonstrate my data and results with example.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(Index1 Index2 Index3 Index4) long id int year float dep_var 5.46687 . . . 1 1999 0 3.5714 53.3333 49.2386 37.4359 1 2000 .0469986 3.77717 . . . 1 2001 0 3.97991 55.102 35.3535 34.1837 1 2002 0 4.09675 56.6326 44.9495 43.3674 1 2003 0 3.94243 55.665 34.6342 44.335 1 2004 0 3.94921 51.4706 33.1707 50 1 2005 0 4.05847 57.0732 37.0732 48.0392 1 2006 0 3.92085 59.2233 33.4951 50.9709 1 2007 0 4.64972 58.7379 36.4078 50.9709 1 2008 0 4.8054 57.8947 36.8421 45.933 1 2009 0 4.70902 58.3732 33.3333 44.4976 1 2010 0 4.83402 58.7678 37.9147 44.5498 1 2011 0 4.82298 57.8199 40.2844 44.0758 1 2012 0 4.66564 54.9763 44.5498 44.0758 1 2013 0 4.55899 65.3846 45.6731 43.75 1 2014 0 4.52303 68.2692 48.5577 44.2308 1 2015 0 4.86224 66.8269 49.0385 44.2308 1 2016 0 5.33097 68.2692 46.1539 48.5577 1 2017 0 5.62695 69.2308 46.1539 47.1154 1 2018 0 5.89539 71.6346 45.1923 42.7885 1 2019 0 3.5714 53.3333 49.2386 37.4359 2 2000 . 3.77717 . . . 2 2001 0 3.97991 55.102 35.3535 34.1837 2 2002 0 4.09675 56.6326 44.9495 43.3674 2 2003 0 3.94243 55.665 34.6342 44.335 2 2004 . 3.94921 51.4706 33.1707 50 2 2005 . 4.05847 57.0732 37.0732 48.0392 2 2006 .5771455 3.92085 59.2233 33.4951 50.9709 2 2007 . 4.64972 58.7379 36.4078 50.9709 2 2008 . 4.8054 57.8947 36.8421 45.933 2 2009 0 4.70902 58.3732 33.3333 44.4976 2 2010 0 4.83402 58.7678 37.9147 44.5498 2 2011 0 4.82298 57.8199 40.2844 44.0758 2 2012 0 4.66564 54.9763 44.5498 44.0758 2 2013 0 4.55899 65.3846 45.6731 43.75 2 2014 0 4.52303 68.2692 48.5577 44.2308 2 2015 0 4.86224 66.8269 49.0385 44.2308 2 2016 0 5.33097 68.2692 46.1539 48.5577 2 2017 0 5.62695 69.2308 46.1539 47.1154 2 2018 . 5.89539 71.6346 45.1923 42.7885 2 2019 0 5.46687 . . . 3 1999 0 3.5714 53.3333 49.2386 37.4359 3 2000 . 3.77717 . . . 3 2001 . 3.97991 55.102 35.3535 34.1837 3 2002 . 4.09675 56.6326 44.9495 43.3674 3 2003 . 3.94243 55.665 34.6342 44.335 3 2004 . 3.94921 51.4706 33.1707 50 3 2005 . 4.05847 57.0732 37.0732 48.0392 3 2006 . 3.92085 59.2233 33.4951 50.9709 3 2007 0 4.64972 58.7379 36.4078 50.9709 3 2008 0 4.8054 57.8947 36.8421 45.933 3 2009 . 4.70902 58.3732 33.3333 44.4976 3 2010 0 4.83402 58.7678 37.9147 44.5498 3 2011 0 4.82298 57.8199 40.2844 44.0758 3 2012 0 4.66564 54.9763 44.5498 44.0758 3 2013 . 4.55899 65.3846 45.6731 43.75 3 2014 0 4.52303 68.2692 48.5577 44.2308 3 2015 . 4.86224 66.8269 49.0385 44.2308 3 2016 0 5.33097 68.2692 46.1539 48.5577 3 2017 0 5.62695 69.2308 46.1539 47.1154 3 2018 0 5.89539 71.6346 45.1923 42.7885 3 2019 0 5.46687 . . . 4 1999 0 3.5714 53.3333 49.2386 37.4359 4 2000 0 3.77717 . . . 4 2001 0 3.97991 55.102 35.3535 34.1837 4 2002 0 4.09675 56.6326 44.9495 43.3674 4 2003 . 3.94243 55.665 34.6342 44.335 4 2004 0 3.94921 51.4706 33.1707 50 4 2005 0 4.05847 57.0732 37.0732 48.0392 4 2006 0 3.92085 59.2233 33.4951 50.9709 4 2007 0 4.64972 58.7379 36.4078 50.9709 4 2008 0 4.8054 57.8947 36.8421 45.933 4 2009 0 4.70902 58.3732 33.3333 44.4976 4 2010 0 4.83402 58.7678 37.9147 44.5498 4 2011 0 4.82298 57.8199 40.2844 44.0758 4 2012 0 4.66564 54.9763 44.5498 44.0758 4 2013 0 4.55899 65.3846 45.6731 43.75 4 2014 0 4.52303 68.2692 48.5577 44.2308 4 2015 0 4.86224 66.8269 49.0385 44.2308 4 2016 0 5.33097 68.2692 46.1539 48.5577 4 2017 0 5.62695 69.2308 46.1539 47.1154 4 2018 0 5.89539 71.6346 45.1923 42.7885 4 2019 0 5.46687 . . . 5 1999 . 3.5714 53.3333 49.2386 37.4359 5 2000 . 3.77717 . . . 5 2001 0 3.97991 55.102 35.3535 34.1837 5 2002 0 4.09675 56.6326 44.9495 43.3674 5 2003 0 3.94243 55.665 34.6342 44.335 5 2004 . 3.94921 51.4706 33.1707 50 5 2005 0 4.05847 57.0732 37.0732 48.0392 5 2006 . 3.92085 59.2233 33.4951 50.9709 5 2007 0 4.64972 58.7379 36.4078 50.9709 5 2008 . 4.8054 57.8947 36.8421 45.933 5 2009 0 4.70902 58.3732 33.3333 44.4976 5 2010 0 4.83402 58.7678 37.9147 44.5498 5 2011 0 4.82298 57.8199 40.2844 44.0758 5 2012 0 4.66564 54.9763 44.5498 44.0758 5 2013 0 4.55899 65.3846 45.6731 43.75 5 2014 . 4.52303 68.2692 48.5577 44.2308 5 2015 0 end label values id id label def id 1 "000002.SZ", modify label def id 2 "000004.SZ", modify label def id 3 "000005.SZ", modify label def id 4 "000006.SZ", modify label def id 5 "000007.SZ", modify
Code:
pwcorr dep_var Index1 Index2 Index3 Index4 , sig star(.01)
| dep_var Index1 Index2 Index3 Index4
-------------+---------------------------------------------
dep_var | 1.0000
|
|
Index1 | -0.1242 1.0000
| 0.2819
|
Index2 | -0.0867 0.7584* 1.0000
| 0.4757 0.0000
|
Index3 | -0.0658 0.3183* 0.5552* 1.0000
| 0.5884 0.0021 0.0000
|
Index4 | 0.0787 0.1896 0.1301 -0.3035* 1.0000
| 0.5172 0.0719 0.2190 0.0034
|Code:
reg dep_var Index1 Index2 Index3 i.id i.year
note: 2017.year omitted because of collinearity.
note: 2018.year omitted because of collinearity.
note: 2019.year omitted because of collinearity.
Source | SS df MS Number of obs = 70
-------------+---------------------------------- F(22, 47) = 1.28
Model | .123540378 22 .005615472 Prob > F = 0.2345
Residual | .206200354 47 .004387242 R-squared = 0.3747
-------------+---------------------------------- Adj R-squared = 0.0819
Total | .329740732 69 .004778851 Root MSE = .06624
------------------------------------------------------------------------------
dep_var | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
Index1 | .0607583 .2623863 0.23 0.818 -.4670948 .5886114
Index2 | -.0082303 .0411951 -0.20 0.843 -.0911041 .0746435
Index3 | .0068583 .1109061 0.06 0.951 -.216256 .2299725
|
id |
000004.SZ | .0420866 .0246522 1.71 0.094 -.0075072 .0916804
000005.SZ | .0090231 .0272016 0.33 0.742 -.0456995 .0637458
000006.SZ | -.0035913 .0218879 -0.16 0.870 -.047624 .0404413
000007.SZ | .0108503 .0272195 0.40 0.692 -.0439084 .0656089
|
year |
2002 | .0473329 1.50762 0.03 0.975 -2.985606 3.080272
2003 | -.0182899 .4098085 -0.04 0.965 -.8427182 .8061384
2004 | .073309 1.573491 0.05 0.963 -3.092147 3.238765
2005 | .0441975 1.844819 0.02 0.981 -3.667099 3.755494
2006 | .2388757 1.268526 0.19 0.851 -2.31307 2.790822
2007 | .1058522 1.615725 0.07 0.948 -3.144567 3.356271
2008 | .0398561 1.309828 0.03 0.976 -2.595177 2.674889
2009 | .0099532 1.289834 0.01 0.994 -2.584859 2.604765
2010 | .0444741 1.660176 0.03 0.979 -3.295369 3.384318
2011 | .0087066 1.14843 0.01 0.994 -2.301636 2.319049
2012 | -.0146761 .9171183 -0.02 0.987 -1.85968 1.830328
2013 | -.0584361 .5495863 -0.11 0.916 -1.164061 1.047189
2014 | .0264604 .1801508 0.15 0.884 -.3359562 .388877
2015 | .0321464 .3668656 0.09 0.931 -.705892 .7701847
2016 | -.0031748 .3119827 -0.01 0.992 -.630803 .6244534
2017 | 0 (omitted)
2018 | 0 (omitted)
2019 | 0 (omitted)
|
_cons | -.0904379 6.801528 -0.01 0.989 -13.77335 13.59247
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
Index1 | 349.95 0.002858
Index2 | 886.38 0.001128
Index3 | 6173.44 0.000162
id |
2 | 1.47 0.681963
3 | 1.45 0.691748
4 | 1.46 0.684869
5 | 1.45 0.690839
year |
2002 | 1953.88 0.000512
2003 | 109.92 0.009098
2004 | 1096.42 0.000912
2005 | 2227.48 0.000449
2006 | 1053.19 0.000949
2007 | 2244.14 0.000446
2008 | 1122.88 0.000891
2009 | 1430.15 0.000699
2010 | 2916.77 0.000343
2011 | 1395.73 0.000716
2012 | 890.11 0.001123
2013 | 259.65 0.003851
2014 | 27.90 0.035844
2015 | 115.70 0.008643
2016 | 83.67 0.011952
-------------+----------------------
Mean VIF | 1106.51So my question is rather than dropping, can we do something to deal with Multicollinearity
0 Response to Dealing with Highly Collinear Independent Variables
Post a Comment