I have a few questions concerning the choices I have to make with respect to unbalanced panel data. I know that similar questions have already been asked here, but I am still struggling with the choices I have to make. I am using StataMP 13 on a Macbook Pro.
I am dealing with a large dataset of 181 cases and 44 time-periods (n=7486). When I run the xtset command I am told that the panel variable is unbalanced. I also know that the data is unbalanced because my independent variables have randomly missing data. I am now faced with a number of options from which I don't know how to select.
1. I have read that the use of panel corrected standard errors is suggested for panel data because such standard errors are more reliable (Beck & Katz 1995)*. The issue here, however, is that when I run my model through the xtpcse command I get the following error: "Number of gaps in sample: 70. No time periods are common to all panels, cannot estimate disturbance covariance matrix using casewise inclusion." I know what this means, but I don't know what to do about it. I have tried using the pairwise command which allows me to run the model successfully, but I don't know what types of calculation problems this may be causing. I have also repeated the pairwise approach by removing all cases with less than 5 observations, but I am still not sure as to what the problems may be with this approach. If the pairwise approach is acceptable, then what is the minimum number of observations necessary, and do these observations need to be continuous, e.g. 2001, 2002, 2003, 2004 as opposed to 2000, 2005, 2007, 2010?
*Beck, N., & Katz, J. N. (1995). What to do (and not to do) with time-series cross-section data. American Political Science Review, 89(3), 634-647.
2. The second option that I have followed is through the use of the xtreg command. I am familiar with xtreg and the choice between fixed-effect and random-effect models, but I am not sure if the unbalanced dataset is causing problems here as well. My question here is, which approach is better: xtpcse or xtreg, and why?
I am pasting the results in the code box below. I am only running simplified models here so to minimize confusion:
- My dependent variables is level of democracy: v2x_libdem
- My independent variables are fragmentations in government and opposition: govfrac oppfrac
- The other two variables in the dataset are Country_id and year.
- I am also providing an example of the dataset generated by -dataex-.
Code:
. xtset country_id year panel variable: country_id (unbalanced) time variable: year, 1975 to 2018 delta: 1 unit . xtdescribe country_id: 3, 4, ..., 236 n = 181 year: 1975, 1976, ..., 2018 T = 44 Delta(year) = 1 unit Span(year) = 44 periods (country_id*year uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 8 28 44 44 44 44 44 Freq. Percent Cum. | Pattern ---------------------------+---------------------------------------------- 155 85.64 85.64 | 11111111111111111111111111111111111111111111 14 7.73 93.37 | ...............11111111111111111111111111111 3 1.66 95.03 | ................1111111111111111111111111111 2 1.10 96.13 | 1111111111111111............................ 1 0.55 96.69 | ....................................11111111 1 0.55 97.24 | ................................111111111111 1 0.55 97.79 | ........................11111111111111111111 1 0.55 98.34 | .......................111111111111111111111 1 0.55 98.90 | ..................11111111111111111111111111 2 1.10 100.00 | (other patterns) ---------------------------+---------------------------------------------- 181 100.00 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(country_id year oppfrac govfrac v2x_libdem) 3 1975 . . .109 3 1976 . . .11 3 1977 .64 0 .118 3 1978 .64 0 .123 3 1979 .64 0 .125 3 1980 .76 0 .127 3 1981 .76 0 .133 3 1982 .76 0 .145 3 1983 .65 0 .147 3 1984 .65 0 .147 3 1985 .65 0 .148 3 1986 .8 0 .152 3 1987 .81 0 .152 3 1988 .81 0 .176 3 1989 .49 0 .186 3 1990 .49 0 .198 3 1991 .49 0 .217 3 1992 .49 0 .231 3 1993 .49 0 .248 3 1994 .49 0 .273 3 1995 .52 0 .293 3 1996 .52 0 .305 3 1997 .52 0 .36 3 1998 .55 0 .394 3 1999 .55 0 .41 3 2000 .55 0 .461 3 2001 .4 .13 .495 3 2002 .4 .13 .513 3 2003 .4 .13 .531 3 2004 .46 .18 .546 3 2005 .46 .18 .546 3 2006 .46 .18 .517 3 2007 .68 0 .485 3 2008 .68 0 .505 3 2009 .68 0 .496 3 2010 .52 0 .489 3 2011 .52 0 .489 3 2012 .52 0 .453 3 2013 .59 .21 .458 3 2014 .59 .21 .429 3 2015 .59 .21 .465 3 2016 .73 .31 .487 3 2017 .73 .31 .491 3 2018 . . .527 4 1975 . . .566 4 1976 .3 0 .671 4 1977 .3 0 .657 4 1978 0 0 .686 4 1979 0 0 .686 4 1980 0 0 .17 4 1981 . . .151 4 1982 . . .153 4 1983 . . .15 4 1984 . . .15 4 1985 . . .201 4 1986 . . .201 4 1987 . . .213 4 1988 .57 .72 .516 4 1989 .57 .72 .564 4 1990 .57 .72 .199 4 1991 .57 .72 .472 4 1992 .51 .71 .655 4 1993 .51 .71 .669 4 1994 .51 .71 .681 4 1995 .51 .71 .681 4 1996 .51 .71 .695 4 1997 . .66 .698 4 1998 . .66 .698 4 1999 . .66 .698 4 2000 . .66 .686 4 2001 .67 0 .691 4 2002 .67 0 .686 4 2003 .67 0 .698 4 2004 .67 0 .689 4 2005 .67 0 .674 4 2006 .53 .3 .696 4 2007 .53 .3 .667 4 2008 .53 .3 .688 4 2009 .53 .3 .688 4 2010 .53 .3 .671 4 2011 .13 .54 .699 4 2012 .13 .54 .694 4 2013 .13 .54 .67 4 2014 .13 .54 .67 4 2015 .13 .54 .669 4 2016 0 .37 .645 4 2017 0 .37 .633 4 2018 . . .635 5 1975 .62 .19 .865 5 1976 .62 .19 .867 5 1977 .18 .63 .868 5 1978 .18 .63 .868 5 1979 .65 0 .864 5 1980 .2 .65 .858 5 1981 .2 .65 .858 5 1982 .52 .47 .861 5 1983 .66 0 .868 5 1984 .66 0 .868 5 1985 .66 0 .865 5 1986 .65 .19 .867 end . xtpcse v2x_libdem oppfrac govfrac Number of gaps in sample: 70 no time periods are common to all panels, cannot estimate disturbance covariance matrix using casewise inclusion r(459); . xtpcse v2x_libdem oppfrac govfrac, pairwise Number of gaps in sample: 70 (note: at least one disturbance covariance assumed 0, no common time periods between panels) Linear regression, correlated panels corrected standard errors (PCSEs) Group variable: country_id Number of obs = 4475 Time variable: year Number of groups = 154 Panels: correlated (unbalanced) Obs per group: min = 4 Autocorrelation: no autocorrelation avg = 29.05844 Sigma computed by pairwise selection max = 43 Estimated covariances = 11935 R-squared = 0.0607 Estimated autocorrelations = 0 Wald chi2(2) = 175.55 Estimated coefficients = 3 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Panel-corrected v2x_libdem | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- oppfrac | -.0563738 .0149227 -3.78 0.000 -.0856218 -.0271259 govfrac | .2400054 .0193359 12.41 0.000 .2021078 .2779031 _cons | .4531467 .0091391 49.58 0.000 .4352344 .4710591 ------------------------------------------------------------------------------ . xtreg v2x_libdem oppfrac govfrac Random-effects GLS regression Number of obs = 4475 Group variable: country_id Number of groups = 154 R-sq: within = 0.0284 Obs per group: min = 4 between = 0.0516 avg = 29.1 overall = 0.0403 max = 43 Wald chi2(2) = 130.66 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ v2x_libdem | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- oppfrac | .0309818 .0064597 4.80 0.000 .018321 .0436427 govfrac | .075405 .007288 10.35 0.000 .0611207 .0896892 _cons | .3987555 .0195239 20.42 0.000 .3604893 .4370216 -------------+---------------------------------------------------------------- sigma_u | .23772996 sigma_e | .09036332 rho | .87375703 (fraction of variance due to u_i) ------------------------------------------------------------------------------
0 Response to Issues with unbalanced Panel Data
Post a Comment