Hello everyone:

I have a few questions concerning the choices I have to make with respect to unbalanced panel data. I know that similar questions have already been asked here, but I am still struggling with the choices I have to make. I am using StataMP 13 on a Macbook Pro.

I am dealing with a large dataset of 181 cases and 44 time-periods (n=7486). When I run the xtset command I am told that the panel variable is unbalanced. I also know that the data is unbalanced because my independent variables have randomly missing data. I am now faced with a number of options from which I don't know how to select.

1. I have read that the use of panel corrected standard errors is suggested for panel data because such standard errors are more reliable (Beck & Katz 1995)*. The issue here, however, is that when I run my model through the xtpcse command I get the following error: "Number of gaps in sample: 70. No time periods are common to all panels, cannot estimate disturbance covariance matrix using casewise inclusion." I know what this means, but I don't know what to do about it. I have tried using the pairwise command which allows me to run the model successfully, but I don't know what types of calculation problems this may be causing. I have also repeated the pairwise approach by removing all cases with less than 5 observations, but I am still not sure as to what the problems may be with this approach. If the pairwise approach is acceptable, then what is the minimum number of observations necessary, and do these observations need to be continuous, e.g. 2001, 2002, 2003, 2004 as opposed to 2000, 2005, 2007, 2010?
*Beck, N., & Katz, J. N. (1995). What to do (and not to do) with time-series cross-section data. American Political Science Review, 89(3), 634-647.

2. The second option that I have followed is through the use of the xtreg command. I am familiar with xtreg and the choice between fixed-effect and random-effect models, but I am not sure if the unbalanced dataset is causing problems here as well. My question here is, which approach is better: xtpcse or xtreg, and why?

I am pasting the results in the code box below. I am only running simplified models here so to minimize confusion:
  • My dependent variables is level of democracy: v2x_libdem
  • My independent variables are fragmentations in government and opposition: govfrac oppfrac
  • The other two variables in the dataset are Country_id and year.
  • I am also providing an example of the dataset generated by -dataex-.
Any assistance would be greatly appreciated!

Code:

. xtset  country_id year
       panel variable:  country_id (unbalanced)
        time variable:  year, 1975 to 2018
                delta:  1 unit


. xtdescribe

country_id:  3, 4, ..., 236                                  n =        181
    year:  1975, 1976, ..., 2018                             T =         44
           Delta(year) = 1 unit
           Span(year)  = 44 periods
           (country_id*year uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         8      28      44        44        44      44      44

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+----------------------------------------------
      155     85.64   85.64 |  11111111111111111111111111111111111111111111
       14      7.73   93.37 |  ...............11111111111111111111111111111
        3      1.66   95.03 |  ................1111111111111111111111111111
        2      1.10   96.13 |  1111111111111111............................
        1      0.55   96.69 |  ....................................11111111
        1      0.55   97.24 |  ................................111111111111
        1      0.55   97.79 |  ........................11111111111111111111
        1      0.55   98.34 |  .......................111111111111111111111
        1      0.55   98.90 |  ..................11111111111111111111111111
        2      1.10  100.00 | (other patterns)
 ---------------------------+----------------------------------------------
      181    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(country_id year oppfrac govfrac v2x_libdem)
3 1975   .   . .109
3 1976   .   .  .11
3 1977 .64   0 .118
3 1978 .64   0 .123
3 1979 .64   0 .125
3 1980 .76   0 .127
3 1981 .76   0 .133
3 1982 .76   0 .145
3 1983 .65   0 .147
3 1984 .65   0 .147
3 1985 .65   0 .148
3 1986  .8   0 .152
3 1987 .81   0 .152
3 1988 .81   0 .176
3 1989 .49   0 .186
3 1990 .49   0 .198
3 1991 .49   0 .217
3 1992 .49   0 .231
3 1993 .49   0 .248
3 1994 .49   0 .273
3 1995 .52   0 .293
3 1996 .52   0 .305
3 1997 .52   0  .36
3 1998 .55   0 .394
3 1999 .55   0  .41
3 2000 .55   0 .461
3 2001  .4 .13 .495
3 2002  .4 .13 .513
3 2003  .4 .13 .531
3 2004 .46 .18 .546
3 2005 .46 .18 .546
3 2006 .46 .18 .517
3 2007 .68   0 .485
3 2008 .68   0 .505
3 2009 .68   0 .496
3 2010 .52   0 .489
3 2011 .52   0 .489
3 2012 .52   0 .453
3 2013 .59 .21 .458
3 2014 .59 .21 .429
3 2015 .59 .21 .465
3 2016 .73 .31 .487
3 2017 .73 .31 .491
3 2018   .   . .527
4 1975   .   . .566
4 1976  .3   0 .671
4 1977  .3   0 .657
4 1978   0   0 .686
4 1979   0   0 .686
4 1980   0   0  .17
4 1981   .   . .151
4 1982   .   . .153
4 1983   .   .  .15
4 1984   .   .  .15
4 1985   .   . .201
4 1986   .   . .201
4 1987   .   . .213
4 1988 .57 .72 .516
4 1989 .57 .72 .564
4 1990 .57 .72 .199
4 1991 .57 .72 .472
4 1992 .51 .71 .655
4 1993 .51 .71 .669
4 1994 .51 .71 .681
4 1995 .51 .71 .681
4 1996 .51 .71 .695
4 1997   . .66 .698
4 1998   . .66 .698
4 1999   . .66 .698
4 2000   . .66 .686
4 2001 .67   0 .691
4 2002 .67   0 .686
4 2003 .67   0 .698
4 2004 .67   0 .689
4 2005 .67   0 .674
4 2006 .53  .3 .696
4 2007 .53  .3 .667
4 2008 .53  .3 .688
4 2009 .53  .3 .688
4 2010 .53  .3 .671
4 2011 .13 .54 .699
4 2012 .13 .54 .694
4 2013 .13 .54  .67
4 2014 .13 .54  .67
4 2015 .13 .54 .669
4 2016   0 .37 .645
4 2017   0 .37 .633
4 2018   .   . .635
5 1975 .62 .19 .865
5 1976 .62 .19 .867
5 1977 .18 .63 .868
5 1978 .18 .63 .868
5 1979 .65   0 .864
5 1980  .2 .65 .858
5 1981  .2 .65 .858
5 1982 .52 .47 .861
5 1983 .66   0 .868
5 1984 .66   0 .868
5 1985 .66   0 .865
5 1986 .65 .19 .867
end

. xtpcse  v2x_libdem oppfrac govfrac

Number of gaps in sample:  70
no time periods are common to all panels, cannot estimate disturbance
covariance matrix using casewise inclusion
r(459);


. xtpcse  v2x_libdem oppfrac govfrac, pairwise

Number of gaps in sample:  70
(note: at least one disturbance covariance assumed 0, no common time periods
       between panels)

Linear regression, correlated panels corrected standard errors (PCSEs)

Group variable:   country_id                    Number of obs      =      4475
Time variable:    year                          Number of groups   =       154
Panels:           correlated (unbalanced)       Obs per group: min =         4
Autocorrelation:  no autocorrelation                           avg =  29.05844
Sigma computed by pairwise selection                           max =        43
Estimated covariances      =     11935          R-squared          =    0.0607
Estimated autocorrelations =         0          Wald chi2(2)       =    175.55
Estimated coefficients     =         3          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
             |           Panel-corrected
  v2x_libdem |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     oppfrac |  -.0563738   .0149227    -3.78   0.000    -.0856218   -.0271259
     govfrac |   .2400054   .0193359    12.41   0.000     .2021078    .2779031
       _cons |   .4531467   .0091391    49.58   0.000     .4352344    .4710591
------------------------------------------------------------------------------


. xtreg  v2x_libdem oppfrac govfrac

Random-effects GLS regression                   Number of obs      =      4475
Group variable: country_id                      Number of groups   =       154

R-sq:  within  = 0.0284                         Obs per group: min =         4
       between = 0.0516                                        avg =      29.1
       overall = 0.0403                                        max =        43

                                                Wald chi2(2)       =    130.66
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
  v2x_libdem |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     oppfrac |   .0309818   .0064597     4.80   0.000      .018321    .0436427
     govfrac |    .075405    .007288    10.35   0.000     .0611207    .0896892
       _cons |   .3987555   .0195239    20.42   0.000     .3604893    .4370216
-------------+----------------------------------------------------------------
     sigma_u |  .23772996
     sigma_e |  .09036332
         rho |  .87375703   (fraction of variance due to u_i)
------------------------------------------------------------------------------