Dear Statalisters,

This post to announce that I've released complexity - a package to compute complexity indexes - on SSC.

complexity reveals how sophisticated is a specialization pattern across individuals. It stems from the Economic Complexity Index (ECI) and Product Complexity Index (PCI) developped initially by Hidalgo and Haussmann (2009), and later by the Observatory of Economic Complexity (OEC). The objective of this package is not only to reproduce ECI index based on trade flows, but to generalize the intuition to any specialization pattern.

complexity requires as input a matrix of Revealed Comparative Advantage - which indicates the relative specialisation- of individual (e.g. countries) among a set of nodes (e.g. products traded, in the case of the product space). By convention, the individuals are in rows of the RCA matrix and and nodes in columns (the -, transpose- option exists in case where the matrix is initally reverted).
This matrix can either be defined a as mata matrix, a stata matrix or a stata dataset (but no other variables than RCA are tolerated then). The matrix of RCA can be either made of initial RCA [0;+infity[ or in a binary version in which it will be rescaled (0/1, 1 if RCA>=1)

complexity doesn't follow the "method of reflection" that was firstly suggested in Haussmann & Hidalgo (2009), but the further algebraic solution suggested by Tachella et al. (2012), and now used by the Observatory of Economic Complexity (see the OEC webpage for details.)

complexity returns alternatively two Stata variables : Complexity_i and Complexity_n, which are respectively a measure of the individual and node complexity (e.g. ECI / PCI). The output displayed can be chosen through the -,projection()- option that takes either indiv (default) or nodes value.

complexity requires morematapackage to run (available on SSC), please install it before use.

The help file is available for understanding the syntax of the command.

Enjoy computing your own complexity indexes! Below some numerical examples.

Best Regards,
Charlie Joyez


Numerical example: Compute Economic Complexity Index

From UNCTADstats I downloaded the 2018 Exports of all countries in 3digit SITC Rev3 classifciation.
Following the OEC thresholds, I excluded small countries (exporting less than 1 billions USD) and minor products (whose volume traded is below 10 billions USD),
You'll find this dataset attached (in .xls as I could join the .dta file).

Save this xls file to ~yourpath\RCA_complexity2018_UNCTAD.xls
Code:
clear
import excel "~yourpath\RCA_complexity2018_UNCTAD.xls", sheet("Sheet1") firstrow
*From Trade flows to RCA matrix
mkmat v*,mat(Trade)
mata T=st_matrix("Trade")
mata RCA=(T:/rowsum(T)):/(colsum(T):/sum(T))

*Install and compute complexity
ssc install complexity
complexity, mat(RCA)

*Browse results
bro pays Comp
gsort - Complexity_i
This sorting is close to the one from the OEC (8 common countries in top 10), as well as is the distribution of the index. The differences come from the use of a different year (2018 vs 2017 in latest OEC ranking), different trade classifications, and not the exact same set of countries used.


PS: sorry, the countries name are in French, but are not hard to guess.
PPS : Here are the alternative syntaxes to load the RCA matrix, when not stored in mata

1) Using Stata matrix instead of Mata
Code:
clear
import excel "~yourpath\RCA_complexity2018_UNCTAD.xls", sheet("Sheet1") firstrow
*From Trade flows to RCA matrix
mkmat v*,mat(Trade)
mata T=st_matrix("Trade")
mata RCA=(T:/rowsum(T)):/(colsum(T):/sum(T))
mata st_matrix("Stata_RCA",RCA)

complexity, mat(Stata_RCA) source(matrix)
2) Using a .dta file instead
Code:
clear
import excel "~yourpath\RCA_complexity2018_UNCTAD.xls", sheet("Sheet1") firstrow
*From Trade flows to RCA matrix
mkmat v*,mat(Trade)
mata T=st_matrix("Trade")
mata RCA=(T:/rowsum(T)):/(colsum(T):/sum(T))
mata st_matrix("Stata_RCA",RCA)
preserve
clear
svmat Stata_RCA
save "~yourpath\RCA_matrix.dta",replace
restore

complexity, mat("~yourpath\RCA_matrix.dta") source(dta)