Dear all,
Hope you are all doing well,

I am trying to investigate the impact of distance (distance_ij) between origin i and destination j on flow_ij by considering the attributes of origin_i and destination_j.

I have 4 origins and 13 provinces over 13 years, so, the total number of observations is: 572.
However, This is unbalanced data as some observations are not available for one of the origins for the first 8 years (only last 5 years available).

- Dependent variable:
flow_ij
- Independent variables (in natural logarithm):
1. lnorigin_i
2. lndestination_j
3. lndistance_ij

The probability of flow is described by discrete probability distribution.

See data sample below (Listed 16 out of 572 observations):
Code:
input str3 ORG str11 DEST int year float(flow_ij lnorigin_i lndestination_j lndistance_ij)
"DAM" "bah" 2006      0 6.848005  2.549445  7.170888
"DAM" "bah" 2007      0 6.991177  2.484907  7.170888
"DAM" "bah" 2008      0 7.128496   2.61007  7.170888
"DAM" "bah" 2009      0 7.112328 2.5700645  7.170888
"DAM" "bah" 2010      0 7.195187  2.648536  7.170888
"DAM" "bah" 2011      0 7.307873  2.721295  7.170888
"DAM" "bah" 2012      0 7.195187 2.8053784  7.170888
"DAM" "bah" 2013      0 7.130899  2.789118  7.170888
"DAM" "bah" 2014      0 7.466228  2.439444  7.170888
"DAM" "bah" 2015      0 7.577634  2.484907  7.170888
"DAM" "bah" 2016   3800 7.484369 2.5700645  7.170888
"DAM" "bah" 2017      0 7.366445  2.462514  7.170888
"DAM" "bah" 2018   1200 7.340187   2.55787  7.170888
"DAM" "jof" 2006   5400 6.848005   2.61007  7.120444
"DAM" "jof" 2007   6500 6.991177  2.721295  7.120444
"DAM" "jof" 2008   5400 7.128496  2.703596  7.120444


I would like to estimate a log-linear model (poisson regression) for panel data, using individual fixed effects


I have applied glm code


HTML Code:
glm flow_ij lnorigin_i lndistance_ij i.DEST_j, family(poisson) link(log) irls
in the code above,
the fixed effect of origin is included ( i.DEST_j )

The results I got are as follows:
Code:
. glm flow_ij lnorigin_i lndistance_ij i.DEST_j, family(poisson) link(log) irls

Iteration 1:   deviance =  2.70e+07
Iteration 2:   deviance =  1.94e+07
Iteration 3:   deviance =  1.86e+07
Iteration 4:   deviance =  1.86e+07
Iteration 5:   deviance =  1.86e+07
Iteration 6:   deviance =  1.86e+07
Iteration 7:   deviance =  1.86e+07

Generalized linear models                         Number of obs   =        572
Optimization     : MQL Fisher scoring             Residual df     =        557
                   (IRLS EIM)                     Scale parameter =          1
Deviance         =  18551677.22                   (1/df) Deviance =   33306.42
Pearson          =  18688077.98                   (1/df) Pearson  =   33551.31

Variance function: V(u) = u                       [Poisson]
Link function    : g(u) = ln(u)                   [Log]

                                                  BIC             =   1.85e+07

-------------------------------------------------------------------------------
              |                 EIM
      flow_ij |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
   lnorigin_i |   .7029036   .0002109  3333.29   0.000     .7024903    .7033169
lndistance_ij |   -.577117   .0001526 -3781.61   0.000    -.5774162   -.5768179
              |
       DEST_j |
      A       |   .9244987   .0025078   368.65   0.000     .9195836    .9294139
      B       |   2.726537   .0020362  1339.02   0.000     2.722546    2.730528
      C       |   2.711206   .0020558  1318.79   0.000     2.707176    2.715235
      D       |   2.523979    .002069  1219.90   0.000     2.519923    2.528034
      E       |   2.400839   .0020661  1162.00   0.000      2.39679    2.404889
      F       |   1.637443   .0022097   741.04   0.000     1.633112    1.641774
      G       |   1.845645   .0021684   851.16   0.000     1.841395    1.849895
      H       |   2.113996   .0020751  1018.77   0.000     2.109929    2.118063
      I       |   .5495162   .0027211   201.94   0.000     .5441829    .5548495
      J       |   .7086403   .0025717   275.55   0.000     .7035999    .7136807
      K       |   4.139993   .0019932  2077.03   0.000     4.136086      4.1439
      L.      |   1.706293   .0022313   764.71   0.000      1.70192    1.710666
              |
        _cons |   7.160802   .0027506  2603.35   0.000     7.155411    7.166193
-------------------------------------------------------------------------------
My questions are:
1. Is glm suitable for unbalanced panel data?
1. Is the fixed effect specified correctly?
2. do I need to include the fixed effect of distance (i_year)? why?
3. Would the use of cluster on origins benefits the model?

I am using xtreg in Stata 15.1.

Thank you for your time and effort, and my apologies for the long post,
Hussain Sulaimani