Hi Statalist,
I have a dataset with a large number of observations and a small T. The years for my dataset is 2000-2016.
My DV is a proportion. I tried to really dissect some of Jeffrey Wooldridge's work last week regarding
fractional proportional models, especially with regards to the issue of large-N and small-T. (e.g. Papke and Wooldridge 2008, Professor Wooldridge's presentation: https://www.stata.com/meeting/chicag...wooldridge.pdf)
I'm not sure if what I'm doing is correct. My unit of analysis is directed-dyad year (State A-StateB year1, State B-State A year1). I made
a variable called "dyad_dir" where the dyad variable counts each dyad separately (e.g. dyad "1" for StateA-StateB. Dyad "2" for StateB-StateA, dyad "3" for StateA-StateC, dyad "4" for StateC-StateA, etc). I set the panel as dyad_dir year.



My two main independent variables (IVs) are binary--(0,1). One variable has a more limited time frame-2000-2009.
I have a lot of other control variables. My main IVs are x1 and x2.

So far, the panel data shows as strongly balanced. I'm assuming I will still have to use time average variables and control for the number of time periods available for each cross-sectional unit given the small T, and this is where my uncertainty lies.

I created time dummies for all years (2000-2016), but when I run the regression with the time dummy variables, y00-y16. Some years were omitted, while others were not.

I tried to create time averages for the main independent variables. (I am not sure if I have to do it for the control variables).
I did the following, as per the files from the Stata meeting cited above, but I'm not sure if it's correct:
egen x1b = mean(x1), by(dyad_dir)
egen x2b = mean(x2), by(dyad_dir)

I tried to create a variable to control for the number of years available, but the variable was omitted because of collinearity. I also tried to play around with the regression (e.g. not include the year dummies, time averages), and it was still omitted.
egen tobs = sum(1), by(dyad_dir)
gen tobs17 = (tobs == 17)

Data seen below from dataex:
Code:
input float(year y dyad_dir) byte x1 float(x2 x1b x2b x3 x4) byte x5 float(x6 x7 x8 x9) byte x10 float(x11 x12 y00 y01 y02 y03 y04 y05 y06 y07 y08 y09 y10 y11 y12 y13 y14 y15 y16 tobs17)
2000          . 1 0 1 0 1 1 0 2         1 1 1 .031967163 1 0 1.8011683 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2001         .5 1 0 1 0 1 1 0 2         1 1 1 .024324417 1 0 1.8102077 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2002          0 1 0 1 0 1 1 0 .       1.5 1 1 .012064934 1 0 1.6868168 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2003   .3636364 1 0 1 0 1 1 0 .         1 1 1 .023251534 1 0 1.5599297 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2004   .3333333 1 0 1 0 1 1 0 .         1 1 1   .0307827 1 0  1.437317 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
2005   .5555556 1 0 1 0 1 1 0 .         1 1 1 .032816887 1 0 1.2836504 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
2006   .3076923 1 0 1 0 1 1 0 .         1 1 1  .03156376 1 0  2.826926 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
2007         .4 1 0 1 0 1 1 0 .       1.5 1 1  .02896404 1 0  .9335653 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
2008  .14285713 1 0 1 0 1 1 0 .       1.5 1 1 .017453194 1 0  .9192817 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
2009          0 1 0 1 0 1 1 0 .         1 1 1  .02192688 1 0  .8979353 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
2010      .6875 1 . 1 0 1 1 0 .         1 1 1  .01936817 1 0  .8552898 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
2011   .5555556 1 . 1 0 1 1 0 .         1 1 1  .00677681 1 0  .8495679 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
2012   .4782609 1 . . 0 1 1 0 .         1 1 1 .015763283 1 0   .834486 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
2013   .4117647 1 . . 0 1 1 0 .         1 1 1 .012332916 1 0  .8337547 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
2014          0 1 . . 0 1 1 0 .         1 1 1 .012856483 1 0   .838679 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
2015          0 1 . . 0 1 1 0 .         1 1 1 .031881332 1 0   .850991 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
2016          . 1 . . 0 1 1 0 .         1 1 1 .037225723 1 0  .8440136 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
2000          . 2 0 1 0 1 1 0 .         1 1 1   .3551378 1 0 1.8011683 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2001          0 2 0 1 0 1 1 0 .         1 1 1  .34648895 1 0 1.8102077 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2002          0 2 0 1 0 1 1 0 .         2 1 1   .3478985 1 0 1.6868168 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2003          0 2 0 1 0 1 1 0 .         2 1 1   .4008026 1 0 1.5599297 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2004          0 2 0 1 0 1 1 0 .       1.5 1 1   .4411621 1 0  1.437317 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
2005          0 2 0 1 0 1 1 0 .       1.5 1 1   .4520903 1 0 1.2836504 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
2006         .3 2 0 1 0 1 1 0 .       1.5 1 1   .4636974 1 0  2.826926 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
2007          0 2 0 1 0 1 1 0 .       2.5 1 1   .4766197 1 0  .9335653 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
2008   .3333333 2 0 1 0 1 1 0 .         2 1 1   .5061283 1 0  .9192817 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
2009          1 2 0 1 0 1 1 0 .         2 1 1   .5294323 1 0  .8979353 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
2010       .125 2 . 1 0 1 1 0 .       1.5 1 1  .54753304 1 0  .8552898 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
2011          0 2 . 1 0 1 1 0 .         2 1 1   .5656557 1 0  .8495679 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
2012  .06666666 2 . . 0 1 1 0 .         2 1 1   .5645571 1 0   .834486 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
2013   .4615385 2 . . 0 1 1 0 .         2 1 1   .5920744 1 0  .8337547 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
2014  .14285713 2 . . 0 1 1 0 .         2 1 1    .624382 1 0   .838679 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
2015          0 2 . . 0 1 1 0 .       2.5 1 1   .6467857 1 0   .850991 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
2016         .2 2 . . 0 1 1 0 .         2 1 1   .6825209 1 0  .8440136 0 0 0 0 0 0 0 0 0 0 0 0
The regression that I used in Stata is xtgee y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 tobs17, family(binomial 1) link(probit) corr(exchangeable) vce(robust). I'm not sure if I made the time averages correctly, and if I did the control for the time periods correctly either. In other words, I'm not sure if I accounted for the small t, large n correctly for the GEE.

Thanks in advanced for any help.