Hope you are all doing well,
I am trying to investigate the impact of distance (distance_ij) between origin i and destination j on flow_ij by considering the attributes of origin_i and destination_j.
I have 4 origins and 13 provinces over 13 years, so, the total number of observations is: 572.
However, This is unbalanced data as some observations are not available for one of the origins for the first 8 years (only last 5 years available).
- Dependent variable:
flow_ij
- Independent variables (in natural logarithm):
1. lnorigin_i
2. lndestination_j
3. lndistance_ij
The probability of flow is described by discrete probability distribution.
See data sample below (Listed 16 out of 572 observations):
Code:
input str3 ORG str11 DEST int year float(flow_ij lnorigin_i lndestination_j lndistance_ij) "DAM" "bah" 2006 0 6.848005 2.549445 7.170888 "DAM" "bah" 2007 0 6.991177 2.484907 7.170888 "DAM" "bah" 2008 0 7.128496 2.61007 7.170888 "DAM" "bah" 2009 0 7.112328 2.5700645 7.170888 "DAM" "bah" 2010 0 7.195187 2.648536 7.170888 "DAM" "bah" 2011 0 7.307873 2.721295 7.170888 "DAM" "bah" 2012 0 7.195187 2.8053784 7.170888 "DAM" "bah" 2013 0 7.130899 2.789118 7.170888 "DAM" "bah" 2014 0 7.466228 2.439444 7.170888 "DAM" "bah" 2015 0 7.577634 2.484907 7.170888 "DAM" "bah" 2016 3800 7.484369 2.5700645 7.170888 "DAM" "bah" 2017 0 7.366445 2.462514 7.170888 "DAM" "bah" 2018 1200 7.340187 2.55787 7.170888 "DAM" "jof" 2006 5400 6.848005 2.61007 7.120444 "DAM" "jof" 2007 6500 6.991177 2.721295 7.120444 "DAM" "jof" 2008 5400 7.128496 2.703596 7.120444
I would like to estimate a log-linear model (poisson regression) for panel data, using individual fixed effects
I have applied glm code
HTML Code:
glm flow_ij lnorigin_i lndistance_ij i.DEST_j, family(poisson) link(log) irls
the fixed effect of origin is included ( i.DEST_j )
The results I got are as follows:
Code:
. glm flow_ij lnorigin_i lndistance_ij i.DEST_j, family(poisson) link(log) irls Iteration 1: deviance = 2.70e+07 Iteration 2: deviance = 1.94e+07 Iteration 3: deviance = 1.86e+07 Iteration 4: deviance = 1.86e+07 Iteration 5: deviance = 1.86e+07 Iteration 6: deviance = 1.86e+07 Iteration 7: deviance = 1.86e+07 Generalized linear models Number of obs = 572 Optimization : MQL Fisher scoring Residual df = 557 (IRLS EIM) Scale parameter = 1 Deviance = 18551677.22 (1/df) Deviance = 33306.42 Pearson = 18688077.98 (1/df) Pearson = 33551.31 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] BIC = 1.85e+07 ------------------------------------------------------------------------------- | EIM flow_ij | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- lnorigin_i | .7029036 .0002109 3333.29 0.000 .7024903 .7033169 lndistance_ij | -.577117 .0001526 -3781.61 0.000 -.5774162 -.5768179 | DEST_j | A | .9244987 .0025078 368.65 0.000 .9195836 .9294139 B | 2.726537 .0020362 1339.02 0.000 2.722546 2.730528 C | 2.711206 .0020558 1318.79 0.000 2.707176 2.715235 D | 2.523979 .002069 1219.90 0.000 2.519923 2.528034 E | 2.400839 .0020661 1162.00 0.000 2.39679 2.404889 F | 1.637443 .0022097 741.04 0.000 1.633112 1.641774 G | 1.845645 .0021684 851.16 0.000 1.841395 1.849895 H | 2.113996 .0020751 1018.77 0.000 2.109929 2.118063 I | .5495162 .0027211 201.94 0.000 .5441829 .5548495 J | .7086403 .0025717 275.55 0.000 .7035999 .7136807 K | 4.139993 .0019932 2077.03 0.000 4.136086 4.1439 L. | 1.706293 .0022313 764.71 0.000 1.70192 1.710666 | _cons | 7.160802 .0027506 2603.35 0.000 7.155411 7.166193 -------------------------------------------------------------------------------
1. Is glm suitable for unbalanced panel data?
1. Is the fixed effect specified correctly?
2. do I need to include the fixed effect of distance (i_year)? why?
3. Would the use of cluster on origins benefits the model?
I am using xtreg in Stata 15.1.
Thank you for your time and effort, and my apologies for the long post,
Hussain Sulaimani
0 Response to Log-linear model for Unbalanced panel data [glm]
Post a Comment