I am a final year undergraduate student working on dissertation titled 'The Effect of Epidemic on International Tourism Flows: The Role of Public Healthcare Spending'. This study comprises information on bilateral tourist arrivals from 191 origin countries to 180 destination countries, forming 15,276 pairs of the countries from 1995 to 2015. The unbalanced panel dataset encompasses 206,171 observations after excluding the missing values. I am interested to find the moderating effect of the pubic healthcare spending on the relationship between international tourism flows and past epidemic outbreaks. With that said, my Y=lfow, X=epidemic_d and interaction term=Below is the description for the variables in the study:
Code:
1. lflow logarithmic of bilateral tourists arrival between origin and destination 2. flow bilateral tourists arrival between origin and destination 2. lgdp_o logarithmic of GDP per capita at origin (normalised by 10000) 3. lgdp_d logarithmic of GDP per capita at destination (normalised by 10000) 4. ldistw logarithmic of distance between origin and destination 5. lpop_o logarithmic of population in origin country 6. lpop_d logarithmic of population in destination country 7. lRP_od logarithmic of relative price between origin and destination country 8. epidemic_d Share of population affected by epidemic in destination country 9. epidemic_lagged_d Share of population affected by epidemic in destination country (one year lagged) 10. healthgdp_d Public healthcare expenditure (% of GDP) 11. healthgdp_lagged_d Public healthcare expenditure (% of GDP) (one year lagged) 12. epihgdp Interaction term between epidemic_d and healthgdp_d 13. epihgdp_lagged_d Interaction term between epidemic_lagged_d and healthgdp_lagged_d
The regression methods I am going to use are FE and PPML.
In FE estimation, I estimated for three specifications. The code are as follows:
Code:
eststo:xi:xtreg lflow epidemic_d lgdp_o lgdp_d lpop_o lpop_d lRP_od i.year , fe robust
Code:
eststo:xi:xtreg lflow epidemic_d healthgdp_d lgdp_o lgdp_d lpop_o lpop_d lRP_od i.year , fe robust
Code:
eststo:xi:xtreg lflow epidemic_d healthgdp_d epihgdp lgdp_o lgdp_d lpop_o lpop_d lRP_od i.year , fe robust
Code:
Fixed-effects (within) regression Number of obs = 152,289
Group variable: pairid Number of groups = 13,283
R-squared: Obs per group:
Within = 0.2618 min = 1
Between = 0.3826 avg = 11.5
Overall = 0.3539 max = 16
F(23,13282) = 418.61
corr(u_i, Xb) = 0.3785 Prob > F = 0.0000
(Std. err. adjusted for 13,283 clusters in pairid)
------------------------------------------------------------------------------
| Robust
lflow | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
epidemic_d | -20.10204 6.929843 -2.90 0.004 -33.68552 -6.518561
healthgdp_d | -.0513586 .0044858 -11.45 0.000 -.0601514 -.0425659
epihgdp | 3.198859 1.375551 2.33 0.020 .5025833 5.895135
lgdp_o | .3809364 .0215977 17.64 0.000 .3386018 .423271
lgdp_d | .4053506 .0199294 20.34 0.000 .3662861 .4444152
lpop_o | .1450008 .072187 2.01 0.045 .0035041 .2864976
lpop_d | .1948498 .0729338 2.67 0.008 .0518891 .3378106
lRP_od | .0843827 .0126933 6.65 0.000 .059502 .1092633
_Iyear_1996 | 0 (omitted)
_Iyear_1997 | 0 (omitted)
_Iyear_1998 | 0 (omitted)
_Iyear_1999 | 0 (omitted)
_Iyear_2000 | -.3337469 .0305551 -10.92 0.000 -.3936393 -.2738544
_Iyear_2001 | -.3341663 .0294902 -11.33 0.000 -.3919713 -.2763612
_Iyear_2002 | -.3272019 .0280088 -11.68 0.000 -.3821031 -.2723007
_Iyear_2003 | -.3627477 .0250352 -14.49 0.000 -.4118203 -.3136752
_Iyear_2004 | -.3262934 .0222754 -14.65 0.000 -.3699562 -.2826305
_Iyear_2005 | -.3073352 .0198753 -15.46 0.000 -.3462935 -.2683768
_Iyear_2006 | -.2940875 .0174447 -16.86 0.000 -.3282816 -.2598934
_Iyear_2007 | -.3162966 .0154593 -20.46 0.000 -.346599 -.2859942
_Iyear_2008 | -.337098 .0135715 -24.84 0.000 -.3637001 -.3104958
_Iyear_2009 | -.2683101 .012169 -22.05 0.000 -.292163 -.2444572
_Iyear_2010 | -.2578123 .0108087 -23.85 0.000 -.2789988 -.2366258
_Iyear_2011 | -.2565084 .0099581 -25.76 0.000 -.2760276 -.2369891
_Iyear_2012 | -.189837 .0086714 -21.89 0.000 -.2068342 -.1728397
_Iyear_2013 | -.1696128 .007937 -21.37 0.000 -.1851704 -.1540552
_Iyear_2014 | -.1530084 .0067724 -22.59 0.000 -.1662833 -.1397336
_Iyear_2015 | 0 (omitted)
_cons | -.1052812 .3171258 -0.33 0.740 -.726893 .5163306
-------------+----------------------------------------------------------------
sigma_u | 2.8881156
sigma_e | .6123985
rho | .95697322 (fraction of variance due to u_i)
------------------------------------------------------------------------------I repeated the three specifications above with PPML as a robustness check to FE. Majority of variables become insignificant. The code is as follows:
Code:
eststo:xi:ppmlhdfe flow epidemic_d lgdp_o lgdp_d lpop_o lpop_d lRP_od, a(year pairid) nolog
Code:
eststo:xi:ppmlhdfe flow epidemic_d healthgdp_d lgdp_o lgdp_d lpop_o lpop_d lRP_od, a(year pairid ) nolog
Code:
eststo:xi:ppmlhdfe flow epidemic_d healthgdp_d epihgdp lgdp_o lgdp_d lpop_o lpop_d lRP_od , a(year pairid) nolog
Code:
HDFE PPML regression No. of obs = 208,926
Absorbing 2 HDFE groups Residual df = 195,620
Wald chi2(8) = 202.95
Deviance = 7.77752e+17 Prob > chi2 = 0.0000
Log pseudolikelihood = -3.88876e+17 Pseudo R2 = 0.9982
------------------------------------------------------------------------------
| Robust
flow | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
epidemic_d | -194.7957 659.987 -0.30 0.768 -1488.347 1098.755
healthgdp_d | -.3241675 .0399636 -8.11 0.000 -.4024947 -.2458403
epihgdp | 2.260448 108.162 0.02 0.983 -209.7333 214.2542
lgdp_o | .2913518 .1665717 1.75 0.080 -.0351228 .6178263
lgdp_d | .0382627 .1125117 0.34 0.734 -.1822561 .2587816
lpop_o | 5.494733 .7255132 7.57 0.000 4.072754 6.916713
lpop_d | -1.594065 .7285146 -2.19 0.029 -3.021927 -.1662026
lRP_od | .3781714 .1781541 2.12 0.034 .0289959 .727347
_cons | 35.84048 3.988092 8.99 0.000 28.02396 43.65699
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
year | 16 0 16 |
pairid | 13283 1 13282 |
-----------------------------------------------------+My questions are:
1. I know the level of significance cannot be used to judge whether the regression is a 'good' or a 'bad' one. Instead, it reveals some information to the researchers. Viewing my case, is this possibly caused by my mistakes or its the regression trying to tell me something? What might be the reason behind?
2. I understand PPML is efficient in solving sample selection bias caused by zero observations. Indeed, the bilateral tourism data in this study has large number of missing data. I replaced the missing data with 0 using the following command:
Code:
replace flow=0 if flow==.
3. If yes, what test/verification should I conduct next to justify this condition? Any recommendation on articles for me to refer?
4. If no, what should I do next? I have already checked my data and it appears to be correct.
Thank you everyone for your input!
Best regards,
Jacyln Hu.
0 Response to Most explanatory variables turn from significant in FE to insignificant in PPML.
Post a Comment