Dear Statalisters,

I want to measure the influence of trade agreements on the composition of the traded goods according to their complexity.
There's a Product Complexity Index
Code:
. sum PCIMean

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     PCIMean | 74,196,336    .0990187    .9274192      -3.61       2.86
My dataset has around 100 million observations, this is what it currently looks like:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(exporter importer year) str4 sitc_product_code float PCIMean double(PCIValue export_value) float(expr_is_AdvEcon impr_is_AdvEcon expr_is_LA impr_is_LA agreements_count)
4  8 2002 "8994"  .04566666     .       0 0 0 0 0 0
4  8 2004 "8510"    -1.1266     .       0 0 0 0 0 0
4  8 2007 "8946"   .9043334     .       0 0 0 0 0 0
4  8 2009 "2690"     -.8125     .       0 0 0 0 0 0
4  8 2009 "6596"    -.17275     .    1267 0 0 0 0 0
4  8 2009 "9310"          .     .       0 0 0 0 0 0
4  8 2010 "2690"      -.886     .     301 0 0 0 0 0
4  8 2010 "7611"          .     .     179 0 0 0 0 0
4  8 2010 "7638"   .6273333     .     143 0 0 0 0 0
4  8 2010 "8459"    -1.4354     .     182 0 0 0 0 0
4  8 2010 "8471"    -.74844     .     803 0 0 0 0 0
4  8 2010 "8942"   .4011667     .     143 0 0 0 0 0
4  8 2011 "2690"      -.839     .       0 0 0 0 0 0
4  8 2011 "5989"   .7885333     .       0 0 0 0 0 0
4  8 2011 "9310"          .     .       0 0 0 0 0 0
4  8 2012 "2690"      -.894     .     114 0 0 0 0 0
4  8 2012 "8459" -1.3744445     .    2596 0 0 0 0 0
4  8 2013 "2690"      -.774     .    1709 0 0 0 0 0
4  8 2013 "8482"      -.524 -.524    2804 0 0 0 0 0
4  8 2014 "9310"          .     .   10344 0 0 0 0 0
4  8 2015 "9310"          .     .    6594 0 0 0 0 0
4  8 2017 "9310"          .     .       0 0 0 0 0 0
4 12 1993 "7491"          .     .    1184 0 0 0 0 0
4 12 1994 "7492"          .     .    8598 0 0 0 0 0
4 12 1995 "7415"       .241  .241   34853 0 0 0 0 0
4 12 1996 "0752" -1.9233333     .  215315 0 0 0 0 0
4 12 1996 "2922"      -2.05 -2.05   16834 0 0 0 0 0
4 12 1997 "0752" -1.8066666     .  200432 0 0 0 0 0
4 12 1997 "2924"      -2.03 -2.03    2483 0 0 0 0 0
4 12 1998 "0752" -1.8183334     .  428651 0 0 0 0 0
4 12 1998 "8745"   .8836667     .    2406 0 0 0 0 0
4 12 1999 "0752" -1.7016667     .  387053 0 0 0 0 0
4 12 1999 "5530"     .02104     .     733 0 0 0 0 0
4 12 1999 "8942"  .04066667     .     785 0 0 0 0 0
4 12 2000 "0752" -1.7633333     .   20919 0 0 0 0 0
4 12 2000 "2924"      -2.08 -2.08    1948 0 0 0 0 0
4 12 2000 "5530"      -.054     .     524 0 0 0 0 0
4 12 2000 "7493"       1.56  1.56     180 0 0 0 0 0
4 12 2001 "0752"      -1.61     .   46905 0 0 0 0 0
4 12 2001 "2929"    -1.5942     .    2490 0 0 0 0 0
4 12 2002 "0111"    -.39245     . 2157053 0 0 0 0 0
4 12 2002 "0577"       -1.8     .    4495 0 0 0 0 0
4 12 2002 "2924"      -1.88 -1.88    3246 0 0 0 0 0
4 12 2002 "2929"     -1.474     .    1097 0 0 0 0 0
4 12 2002 "6531"    -.24085     .   14422 0 0 0 0 0
4 12 2003 "0577"      -1.75     .    3932 0 0 0 0 0
4 12 2003 "2924"      -1.98 -1.98    1181 0 0 0 0 0
4 12 2003 "7432"          .     .    2100 0 0 0 0 0
4 12 2003 "7721"      .9265     .     375 0 0 0 0 0
4 12 2004 "0752"      -1.55     .    6719 0 0 0 0 0
4 12 2004 "2924"      -1.99 -1.99    2194 0 0 0 0 0
4 12 2004 "5989"    .876625     .     118 0 0 0 0 0
4 12 2004 "7283"       .375  .375     240 0 0 0 0 0
4 12 2005 "0577"     -1.965     .    2729 0 0 0 0 0
4 12 2005 "0752" -1.6366667     .   14501 0 0 0 0 0
4 12 2005 "2929"    -1.4346     .     473 0 0 0 0 0
4 12 2005 "5989"   .9223125     .     111 0 0 0 0 0
4 12 2005 "7139"       1.24  1.24   25761 0 0 0 0 0
4 12 2006 "0752" -1.6083333     .    1203 0 0 0 0 0
4 12 2006 "2223"          .     .       0 0 0 0 0 0
4 12 2006 "2922"      -2.17 -2.17     198 0 0 0 0 0
4 12 2006 "2924"      -1.85 -1.85    1308 0 0 0 0 0
4 12 2006 "8748"       1.54  1.54    7618 0 0 0 0 0
4 12 2007 "0548"      -.805     .    1357 0 0 0 0 0
4 12 2007 "0575"      -.776 -.776   38490 0 0 0 0 0
4 12 2007 "0752" -1.5766667     .   26229 0 0 0 0 0
4 12 2007 "2924"      -1.91 -1.91    2765 0 0 0 0 0
4 12 2007 "5541"      -1.06 -1.06       0 0 0 0 0 0
4 12 2007 "5989"     .96475     .     498 0 0 0 0 0
4 12 2007 "7849"       1.46  1.46     846 0 0 0 0 0
4 12 2007 "8219"       .341  .341    2252 0 0 0 0 0
4 12 2008 "0548"     -.8725     .    1226 0 0 0 0 0
4 12 2008 "0577"      -1.85     .    2277 0 0 0 0 0
4 12 2008 "0752" -1.5716667     .   34192 0 0 0 0 0
4 12 2008 "2924"      -1.84 -1.84    9742 0 0 0 0 0
4 12 2008 "2927"      -1.53     .     108 0 0 0 0 0
4 12 2008 "7611"          .     .     395 0 0 0 0 0
4 12 2009 "0575"      -.944 -.944    1043 0 0 0 0 0
4 12 2009 "0579" -1.0316666     .     511 0 0 0 0 0
4 12 2009 "2924"      -1.86 -1.86    5874 0 0 0 0 0
4 12 2009 "7239"       .894  .894 1133360 0 0 0 0 0
4 12 2009 "8462"          .     .     519 0 0 0 0 0
4 12 2009 "8484"     -.7776     .    1430 0 0 0 0 0
4 12 2009 "8510"     -.7622     .    1358 0 0 0 0 0
4 12 2010 "2924"      -1.85 -1.85   17774 0 0 0 0 0
4 12 2010 "6991"       .766     .   24386 0 0 0 0 0
4 12 2010 "7641"          .     .   49340 0 0 0 0 0
4 12 2010 "8219"        .28   .28    6505 0 0 0 0 0
4 12 2012 "0752"      -1.57     .    1126 0 0 0 0 0
4 12 2014 "0575"      -1.07 -1.07   28449 0 0 0 0 0
4 12 2014 "0752"      -1.61     .     303 0 0 0 0 0
4 12 2014 "5514"       .304  .304    1245 0 0 0 0 0
4 12 2014 "6652"       .332  .332    8383 0 0 0 0 0
4 12 2014 "6974"       .143  .143    7034 0 0 0 0 0
4 12 2014 "6991"       .704     .     310 0 0 0 0 0
4 12 2015 "2924"      -2.03 -2.03    1121 0 0 0 0 0
4 12 2016 "0577"      -2.04     .   24997 0 0 0 0 0
4 12 2016 "2922"      -2.56 -2.56    1432 0 0 0 0 0
4 12 2016 "7436"          .     .     155 0 0 0 0 0
4 12 2016 "7492"       1.54  1.54     454 0 0 0 0 0
end
label var exporter "Country Code"
label var importer "Country Code"
label var PCIMean "Mean of several HS-Code-PCI-Values belonging to this SITC-Code"
label var PCIValue "PCI Value"
label var expr_is_AdvEcon "exporter is an Advanced Economy according to IMF WEO 2019"
label var impr_is_AdvEcon "importer is an Advanced Economy according to IMF WEO 2019"
label var expr_is_LA "exporter is country of Latinamerica or Carribean"
label var impr_is_LA "importer is country of Latinamerica or Carribean"
My hypothesis is, that trade agreements between Latin American Countries and Advanced Economies will lead
A) to a growth in the complex products (high PCI-Value) share of total exports from the Advanced Economy to the Latin American Country
B) to a growth in the simple products (low PCI-Value) export of total exports from the Latin American Country to the Advanced Economy.

I have one solution, which I do not like though:
I could implement a PCI-Value threshold (≈ neutral product) and than calculate the share of complex product in the total exports. Or the share of the 25 % most complex products / least complex products in the total exports.
But I feel like loosing lots of information with this approach. Is there a better way?

Thank you so much for taking your time!

Johannes