Dear Stata community,

I am working with a gravity model and what to cluster my observations depending on if they are in the highest, middle or lowest third of observations in the respective year, but I am open for different suggestions. More on that after I explain my data and method. I use the ppmlhdfe command written by Sergio Correia, Paulo GuimarĂ£es, Thomas Zylkin. The tool can be installed with the following commands:
Code:
ssc install ftools
ssc install reghdfe
ssc install ppmlhdfe
I am looking only at data where China is the import or export partner:

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input int year str3 iso3_o byte rta float(BRI_mem_o ln_exports_to_china ln_imports_from_china ln_total_china_trade) str6 country_pair float(ln_gdp_2_pop ln_preimp ln_preexpo ln_pretotal)
1990 "ABW" 0 0 . . . "ABWABW"         . . . .
1991 "ABW" 0 0 . . . "ABWABW"         . . . .
1992 "ABW" 0 0 . . . "ABWABW"         . . . .
1993 "ABW" 0 0 . . . "ABWABW"         . . . .
1994 "ABW" 0 0 . . . "ABWABW"  19.52183 . . .
1995 "ABW" 0 0 . . . "ABWABW" 19.415113 . . .
1996 "ABW" 0 0 . . . "ABWABW"  19.43265 . . .
1997 "ABW" 0 0 . . . "ABWABW" 19.588173 . . .
1998 "ABW" 0 0 . . . "ABWABW" 19.712955 . . .
1999 "ABW" 0 0 . . . "ABWABW"  19.74156 . . .
2000 "ABW" 0 0 . . . "ABWABW"  19.86799 . . .
2001 "ABW" 0 0 . . . "ABWABW"  19.87303 . . .
2002 "ABW" 0 0 . . . "ABWABW" 19.849876 . . .
2003 "ABW" 0 0 . . . "ABWABW" 19.888773 . . .
2004 "ABW" 0 0 . . . "ABWABW"  20.04846 . . .
2005 "ABW" 0 0 . . . "ABWABW"  20.11266 . . .
2006 "ABW" 0 0 . . . "ABWABW" 20.172903 . . .
2007 "ABW" 0 0 . . . "ABWABW"  20.32564 . . .
2008 "ABW" 0 0 . . . "ABWABW"  20.44747 . . .
2009 "ABW" 0 0 . . . "ABWABW" 20.224247 . . .
2010 "ABW" 0 0 . . . "ABWABW"  20.19557 . . .
2011 "ABW" 0 0 . . . "ABWABW" 20.281445 . . .
2012 "ABW" 0 0 . . . "ABWABW"         . . . .
2013 "ABW" 0 0 . . . "ABWABW"         . . . .
2014 "ABW" 0 0 . . . "ABWABW"         . . . .
2015 "ABW" 0 0 . . . "ABWABW"         . . . .
2016 "ABW" 0 0 . . . "ABWABW"         . . . .
2017 "ABW" 0 0 . . . "ABWABW"  20.55063 . . .
2018 "ABW" 0 0 . . . "ABWABW"         . . . .
2019 "ABW" 0 0 . . . "ABWABW"         . . . .
1990 "ABW" 0 0 . . . "ABWAFG"         . . . .
1991 "ABW" 0 0 . . . "ABWAFG"         . . . .
1992 "ABW" 0 0 . . . "ABWAFG"         . . . .
1993 "ABW" 0 0 . . . "ABWAFG"         . . . .
1994 "ABW" 0 0 . . . "ABWAFG"         . . . .
1995 "ABW" 0 0 . . . "ABWAFG"         . . . .
1996 "ABW" 0 0 . . . "ABWAFG"         . . . .
1997 "ABW" 0 0 . . . "ABWAFG"         . . . .
1998 "ABW" 0 0 . . . "ABWAFG"         . . . .
1999 "ABW" 0 0 . . . "ABWAFG"         . . . .
2000 "ABW" 0 0 . . . "ABWAFG"         . . . .
2001 "ABW" 0 0 . . . "ABWAFG"  14.68416 . . .
2002 "ABW" 0 0 . . . "ABWAFG" 15.150466 . . .
2003 "ABW" 0 0 . . . "ABWAFG" 15.234106 . . .
2004 "ABW" 0 0 . . . "ABWAFG" 15.418114 . . .
2005 "ABW" 0 0 . . . "ABWAFG" 15.587377 . . .
2006 "ABW" 0 0 . . . "ABWAFG" 15.704497 . . .
2007 "ABW" 0 0 . . . "ABWAFG" 16.085983 . . .
2008 "ABW" 0 0 . . . "ABWAFG"  16.15592 . . .
2009 "ABW" 0 0 . . . "ABWAFG" 16.222836 . . .
2010 "ABW" 0 0 . . . "ABWAFG" 16.427858 . . .
2011 "ABW" 0 0 . . . "ABWAFG" 16.560684 . . .
2012 "ABW" 0 0 . . . "ABWAFG"         . . . .
2013 "ABW" 0 0 . . . "ABWAFG"         . . . .
2014 "ABW" 0 0 . . . "ABWAFG"         . . . .
2015 "ABW" 0 0 . . . "ABWAFG"         . . . .
2016 "ABW" 0 0 . . . "ABWAFG"         . . . .
2017 "ABW" 0 0 . . . "ABWAFG" 16.528923 . . .
2018 "ABW" 0 0 . . . "ABWAFG"         . . . .
2019 "ABW" 0 0 . . . "ABWAFG"         . . . .
1990 "ABW" 0 0 . . . "ABWAGO"         . . . .
1991 "ABW" 0 0 . . . "ABWAGO"         . . . .
1992 "ABW" 0 0 . . . "ABWAGO"         . . . .
1993 "ABW" 0 0 . . . "ABWAGO"         . . . .
1994 "ABW" 0 0 . . . "ABWAGO"   15.6064 . . .
1995 "ABW" 0 0 . . . "ABWAGO" 15.739015 . . .
1996 "ABW" 0 0 . . . "ABWAGO" 16.120626 . . .
1997 "ABW" 0 0 . . . "ABWAGO" 16.187563 . . .
1998 "ABW" 0 0 . . . "ABWAGO"  16.05207 . . .
1999 "ABW" 0 0 . . . "ABWAGO" 15.991986 . . .
2000 "ABW" 0 0 . . . "ABWAGO" 16.419592 . . .
2001 "ABW" 0 0 . . . "ABWAGO"  16.36816 . . .
2002 "ABW" 0 0 . . . "ABWAGO" 16.657751 . . .
2003 "ABW" 0 0 . . . "ABWAGO"  16.76887 . . .
2004 "ABW" 0 0 . . . "ABWAGO" 17.138466 . . .
2005 "ABW" 0 0 . . . "ABWAGO" 17.498556 . . .
2006 "ABW" 0 0 . . . "ABWAGO" 17.886463 . . .
2007 "ABW" 0 0 . . . "ABWAGO" 18.298084 . . .
2008 "ABW" 0 0 . . . "ABWAGO" 18.656734 . . .
2009 "ABW" 0 0 . . . "ABWAGO" 18.403341 . . .
2010 "ABW" 0 0 . . . "ABWAGO" 18.445055 . . .
2011 "ABW" 0 0 . . . "ABWAGO" 18.689266 . . .
2012 "ABW" 0 0 . . . "ABWAGO"         . . . .
2013 "ABW" 0 0 . . . "ABWAGO"         . . . .
2014 "ABW" 0 0 . . . "ABWAGO"         . . . .
2015 "ABW" 0 0 . . . "ABWAGO"         . . . .
2016 "ABW" 0 0 . . . "ABWAGO"         . . . .
2017 "ABW" 0 0 . . . "ABWAGO" 18.593037 . . .
2018 "ABW" 0 0 . . . "ABWAGO"         . . . .
2019 "ABW" 0 0 . . . "ABWAGO"         . . . .
1990 "ABW" 1 0 . . . "ABWAIA"         . . . .
1991 "ABW" 1 0 . . . "ABWAIA"         . . . .
1992 "ABW" 1 0 . . . "ABWAIA"         . . . .
1993 "ABW" 1 0 . . . "ABWAIA"         . . . .
1994 "ABW" 1 0 . . . "ABWAIA"         . . . .
1995 "ABW" 1 0 . . . "ABWAIA"         . . . .
1996 "ABW" 1 0 . . . "ABWAIA"         . . . .
1997 "ABW" 1 0 . . . "ABWAIA"         . . . .
1998 "ABW" 1 0 . . . "ABWAIA"         . . . .
1999 "ABW" 1 0 . . . "ABWAIA"         . . . .
end
The variable ln_gdp_2_pop controls for the market size the two trading partners form together, it is the product of both partners gdp over the product of both populations (log linearized). ln_preimp is the average value of imports of the previous three years (my data goes further back than 1990, thats why this value is always avaiable if the country existed in 1987, log linearized too). rta is a dummy for regional trade agreements with China and BRI_mem_o is the variable of intrest. country_pair is a unique identifyer for each country pair. Using this as fixed effects controls for all time invariant country pair specific variables like distance, contiguity etc.
I then estimate the follwing ppml regressions (one for imports, exports and total trade each) with hdfe:

Code:
ppmlhdfe imports_from_china ln_gdp_2_pop ln_preimp BRI_mem_o rta if year> 1990, absorb(year country_pair) vce(robust)

ppmlhdfe exports_to_china ln_gdp_2_pop ln_preexpo BRI_mem_o rta if year> 1990, absorb(year country_pair) vce(robust)

ppmlhdfe total_china_trade ln_gdp_2_pop ln_pretotal BRI_mem_o rta if year> 1990, absorb(year country_pair) vce(robust)
I obtain the follwing results for total trade for example:
Code:
(dropped 3 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (2.1798e-07)
Iteration 1:   deviance = 2.1018e+10  eps = .         iters = 4    tol = 1.0e-04  min(eta) =  -4.02  P   
Iteration 2:   deviance = 5.7823e+09  eps = 2.63e+00  iters = 3    tol = 1.0e-04  min(eta) =  -6.11      
Iteration 3:   deviance = 1.6919e+09  eps = 2.42e+00  iters = 3    tol = 1.0e-04  min(eta) =  -8.86      
Iteration 4:   deviance = 8.5597e+08  eps = 9.77e-01  iters = 3    tol = 1.0e-04  min(eta) = -10.70      
Iteration 5:   deviance = 7.2615e+08  eps = 1.79e-01  iters = 3    tol = 1.0e-04  min(eta) = -11.49      
Iteration 6:   deviance = 6.9979e+08  eps = 3.77e-02  iters = 3    tol = 1.0e-04  min(eta) = -12.25      
Iteration 7:   deviance = 6.9484e+08  eps = 7.13e-03  iters = 2    tol = 1.0e-04  min(eta) = -12.74      
Iteration 8:   deviance = 6.9421e+08  eps = 9.11e-04  iters = 2    tol = 1.0e-04  min(eta) = -13.17      
Iteration 9:   deviance = 6.9417e+08  eps = 5.78e-05  iters = 2    tol = 1.0e-04  min(eta) = -13.31      
Iteration 10:  deviance = 6.9417e+08  eps = 9.69e-07  iters = 2    tol = 1.0e-05  min(eta) = -13.32      
Iteration 11:  deviance = 6.9417e+08  eps = 1.74e-09  iters = 2    tol = 1.0e-06  min(eta) = -13.32   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 11 iterations and 29 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =      5,145
Absorbing 2 HDFE groups                           Residual df     =      4,909
                                                  Wald chi2(4)    =    3346.53
Deviance             =  694167322.6               Prob > chi2     =     0.0000
Log pseudolikelihood = -347121413.4               Pseudo R2       =     0.9971
------------------------------------------------------------------------------
             |               Robust
total_chin~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ln_gdp_2_pop |   .2087208   .0355928     5.86   0.000     .1389601    .2784815
 ln_pretotal |   .7336988   .0189645    38.69   0.000      .696529    .7708685
   BRI_mem_o |  -.0624149   .0221315    -2.82   0.005    -.1057918    -.019038
         rta |    -.03272   .0221649    -1.48   0.140    -.0761623    .0107223
       _cons |   1.186367    .494268     2.40   0.016     .2176198    2.155115
------------------------------------------------------------------------------

Absorbed degrees of freedom:
------------------------------------------------------+
  Absorbed FE | Categories  - Redundant  = Num. Coefs |
--------------+---------------------------------------|
         year |        29           0          29     |
 country_pair |       204           1         203     |
------------------------------------------------------+
I suspect that I need to cluster my error terms to reduce the effect of heteroskedasticity (I know ppml is already doing that). So I want to assign each import, export and total trade value to a group depending if they are in the lowest, middle or higest third of observations in the respective year. Then I could cluster for these three groups, because I read that the error term is affected by how large the trade flow is in gravity models.
The reason I suspect heteroskedasticity is that the estimation is not working consistent for different subsamples of countries and that the rta variable is changing signs and has very different levels of significance. But as mentioned before I am open for any other suggestion what else I could change.

Kind regards
Michael