Dear Statalist,

I am a final year undergraduate student working on dissertation titled 'The Effect of Epidemic on International Tourism Flows: The Role of Public Healthcare Spending'. This study comprises information on bilateral tourist arrivals from 191 origin countries to 180 destination countries, forming 15,276 pairs of the countries from 1995 to 2015. The unbalanced panel dataset encompasses 206,171 observations after excluding the missing values. I am interested to find the moderating effect of the pubic healthcare spending on the relationship between international tourism flows and past epidemic outbreaks. With that said, my Y=lfow, X=epidemic_d and interaction term=Below is the description for the variables in the study:

Code:
  1. lflow                    logarithmic of bilateral tourists arrival between origin and destination
  2. flow                     bilateral tourists arrival between origin and destination
  2. lgdp_o                 logarithmic of GDP per capita at origin (normalised by 10000)
  3. lgdp_d                 logarithmic of GDP per capita at destination (normalised by 10000)
  4. ldistw                   logarithmic of distance between origin and destination
  5. lpop_o                   logarithmic of population in origin country
  6. lpop_d                   logarithmic of population in destination country
  7. lRP_od                   logarithmic of relative price between origin and destination country
 8. epidemic_d               Share of population affected by epidemic in destination country
 9. epidemic_lagged_d        Share of population affected by epidemic in destination country (one year lagged)
 10. healthgdp_d              Public healthcare expenditure (% of GDP)
 11. healthgdp_lagged_d       Public healthcare expenditure (% of GDP) (one year lagged)
 12. epihgdp                  Interaction term between epidemic_d and healthgdp_d
 13. epihgdp_lagged_d        Interaction term between epidemic_lagged_d and healthgdp_lagged_d

The regression methods I am going to use are FE and PPML.

In FE estimation, I estimated for three specifications. The code are as follows:
Code:
eststo:xi:xtreg lflow epidemic_d lgdp_o lgdp_d lpop_o lpop_d lRP_od   i.year , fe robust
Code:
eststo:xi:xtreg lflow epidemic_d healthgdp_d lgdp_o lgdp_d lpop_o lpop_d lRP_od   i.year , fe robust
Code:
eststo:xi:xtreg lflow epidemic_d healthgdp_d epihgdp  lgdp_o lgdp_d lpop_o lpop_d lRP_od   i.year , fe robust
The direction and significance in FE model looks fine to me. The regression result for the last specification (with epihgdp) is as follows:
Code:
Fixed-effects (within) regression               Number of obs     =    152,289
Group variable: pairid                          Number of groups  =     13,283

R-squared:                                      Obs per group:
     Within  = 0.2618                                         min =          1
     Between = 0.3826                                         avg =       11.5
     Overall = 0.3539                                         max =         16

                                                F(23,13282)       =     418.61
corr(u_i, Xb) = 0.3785                          Prob > F          =     0.0000

                            (Std. err. adjusted for 13,283 clusters in pairid)
------------------------------------------------------------------------------
             |               Robust
       lflow | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  epidemic_d |  -20.10204   6.929843    -2.90   0.004    -33.68552   -6.518561
 healthgdp_d |  -.0513586   .0044858   -11.45   0.000    -.0601514   -.0425659
     epihgdp |   3.198859   1.375551     2.33   0.020     .5025833    5.895135
      lgdp_o |   .3809364   .0215977    17.64   0.000     .3386018     .423271
      lgdp_d |   .4053506   .0199294    20.34   0.000     .3662861    .4444152
      lpop_o |   .1450008    .072187     2.01   0.045     .0035041    .2864976
      lpop_d |   .1948498   .0729338     2.67   0.008     .0518891    .3378106
      lRP_od |   .0843827   .0126933     6.65   0.000      .059502    .1092633
 _Iyear_1996 |          0  (omitted)
 _Iyear_1997 |          0  (omitted)
 _Iyear_1998 |          0  (omitted)
 _Iyear_1999 |          0  (omitted)
 _Iyear_2000 |  -.3337469   .0305551   -10.92   0.000    -.3936393   -.2738544
 _Iyear_2001 |  -.3341663   .0294902   -11.33   0.000    -.3919713   -.2763612
 _Iyear_2002 |  -.3272019   .0280088   -11.68   0.000    -.3821031   -.2723007
 _Iyear_2003 |  -.3627477   .0250352   -14.49   0.000    -.4118203   -.3136752
 _Iyear_2004 |  -.3262934   .0222754   -14.65   0.000    -.3699562   -.2826305
 _Iyear_2005 |  -.3073352   .0198753   -15.46   0.000    -.3462935   -.2683768
 _Iyear_2006 |  -.2940875   .0174447   -16.86   0.000    -.3282816   -.2598934
 _Iyear_2007 |  -.3162966   .0154593   -20.46   0.000     -.346599   -.2859942
 _Iyear_2008 |   -.337098   .0135715   -24.84   0.000    -.3637001   -.3104958
 _Iyear_2009 |  -.2683101    .012169   -22.05   0.000     -.292163   -.2444572
 _Iyear_2010 |  -.2578123   .0108087   -23.85   0.000    -.2789988   -.2366258
 _Iyear_2011 |  -.2565084   .0099581   -25.76   0.000    -.2760276   -.2369891
 _Iyear_2012 |   -.189837   .0086714   -21.89   0.000    -.2068342   -.1728397
 _Iyear_2013 |  -.1696128    .007937   -21.37   0.000    -.1851704   -.1540552
 _Iyear_2014 |  -.1530084   .0067724   -22.59   0.000    -.1662833   -.1397336
 _Iyear_2015 |          0  (omitted)
       _cons |  -.1052812   .3171258    -0.33   0.740     -.726893    .5163306
-------------+----------------------------------------------------------------
     sigma_u |  2.8881156
     sigma_e |   .6123985
         rho |  .95697322   (fraction of variance due to u_i)
------------------------------------------------------------------------------
The problem, however, is with the PPML estimation.
I repeated the three specifications above with PPML as a robustness check to FE. Majority of variables become insignificant. The code is as follows:
Code:
eststo:xi:ppmlhdfe flow epidemic_d lgdp_o lgdp_d lpop_o lpop_d lRP_od, a(year pairid) nolog
Code:
eststo:xi:ppmlhdfe flow epidemic_d healthgdp_d lgdp_o lgdp_d lpop_o lpop_d lRP_od, a(year pairid ) nolog
Code:
eststo:xi:ppmlhdfe flow epidemic_d healthgdp_d epihgdp lgdp_o lgdp_d lpop_o lpop_d lRP_od , a(year pairid) nolog
The result is as follows:
Code:
HDFE PPML regression                              No. of obs      =    208,926
Absorbing 2 HDFE groups                           Residual df     =    195,620
                                                  Wald chi2(8)    =     202.95
Deviance             =  7.77752e+17               Prob > chi2     =     0.0000
Log pseudolikelihood = -3.88876e+17               Pseudo R2       =     0.9982
------------------------------------------------------------------------------
             |               Robust
        flow | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  epidemic_d |  -194.7957    659.987    -0.30   0.768    -1488.347    1098.755
 healthgdp_d |  -.3241675   .0399636    -8.11   0.000    -.4024947   -.2458403
     epihgdp |   2.260448    108.162     0.02   0.983    -209.7333    214.2542
      lgdp_o |   .2913518   .1665717     1.75   0.080    -.0351228    .6178263
      lgdp_d |   .0382627   .1125117     0.34   0.734    -.1822561    .2587816
      lpop_o |   5.494733   .7255132     7.57   0.000     4.072754    6.916713
      lpop_d |  -1.594065   .7285146    -2.19   0.029    -3.021927   -.1662026
      lRP_od |   .3781714   .1781541     2.12   0.034     .0289959     .727347
       _cons |   35.84048   3.988092     8.99   0.000     28.02396    43.65699
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
        year |        16           0          16     |
      pairid |     13283           1       13282     |
-----------------------------------------------------+

My questions are:
1. I know the level of significance cannot be used to judge whether the regression is a 'good' or a 'bad' one. Instead, it reveals some information to the researchers. Viewing my case, is this possibly caused by my mistakes or its the regression trying to tell me something? What might be the reason behind?

2. I understand PPML is efficient in solving sample selection bias caused by zero observations. Indeed, the bilateral tourism data in this study has large number of missing data. I replaced the missing data with 0 using the following command:
Code:
replace flow=0 if flow==.
So, does the insignificance in PPML indicates the sensitivity to zero observations?

3. If yes, what test/verification should I conduct next to justify this condition? Any recommendation on articles for me to refer?

4. If no, what should I do next? I have already checked my data and it appears to be correct.


Thank you everyone for your input!

Best regards,
Jacyln Hu.