Hi all,

I am working with a panel dataset regarding financial products. The dataset has about 200,000 observations and 50 variables. I struggle with finding correlations due to the form of my dataset.

The dataset (a simplification) looks as follows:
Date Return (monthly) ProductID CompanyID Currency Characteristic 1 Characteristic 2 Charcteristic 3
31jan2005 1.3843 1 A USD 1 0 0
28feb2005 -.47 1 A USD 1 0 0
30mar2005 2.213 1 A USD 1 0 0
30jun2001 0.0321 2 C EUR 1 1 1
30jul2001 0.8911 2 C EUR 1 1 1
31jan2005 1.383 3 A EUR 1 0 0
28feb2005 -.047 3 A EUR 1 0 0
30mar2005 2.213 3 A EUR 1 0 0
31dec2013 . 4 C CAD 1 1 1
30jan2014 3.42 4 C CAD 1 1 1
28feb2016 5.0134 5 A USD 0 1 0
30mar2016 0.13 5 A USD 0 1 0
31apr2016 -30.1 5 A USD 0 1 0
The goal of my research is to determine performance differences between certain characteristics, but this requires cleaning this data first. As you can see, the products linked to productID 1 and 3 have a correlation of almost 1. You can see that the only differences between these two products is the reported currency. This is caused by the company (ProductID 1 & 3 are from the same company) reporting the same product but for two currencies, for whatever reason they have.

Now, this duplication is well-known in this database and I aim to use the adjustment from another research: exclude a product when its returns has a correlation of 0.99 or more with the returns of another product. I reckon with about 3000 unique ProductIDs and 2500 CompanyIDs, the most efficient way [read: not blow up my pc] is to calculate correlations of products if one company reports more than one product. In the case of CompanyID A, run a STATA command that calculates the correlation between the returns of:
ProductID 1 and ProductID 3
ProductID 1 and ProductID 5
ProductID 3 and ProductID 5

I am stuck on this issue as I cannot find other posts that solve this issue and my source that cleans this database does not provide a .do file (if STATA was even used). So here comes my question that I have been grinding for the last hours: With what STATA command, or code, do I calculate the correlations between all products by CompanyID?