I am working with a panel dataset regarding financial products. The dataset has about 200,000 observations and 50 variables. I struggle with finding correlations due to the form of my dataset.
The dataset (a simplification) looks as follows:
Date | Return (monthly) | ProductID | CompanyID | Currency | Characteristic 1 | Characteristic 2 | Charcteristic 3 |
31jan2005 | 1.3843 | 1 | A | USD | 1 | 0 | 0 |
28feb2005 | -.47 | 1 | A | USD | 1 | 0 | 0 |
30mar2005 | 2.213 | 1 | A | USD | 1 | 0 | 0 |
30jun2001 | 0.0321 | 2 | C | EUR | 1 | 1 | 1 |
30jul2001 | 0.8911 | 2 | C | EUR | 1 | 1 | 1 |
31jan2005 | 1.383 | 3 | A | EUR | 1 | 0 | 0 |
28feb2005 | -.047 | 3 | A | EUR | 1 | 0 | 0 |
30mar2005 | 2.213 | 3 | A | EUR | 1 | 0 | 0 |
31dec2013 | . | 4 | C | CAD | 1 | 1 | 1 |
30jan2014 | 3.42 | 4 | C | CAD | 1 | 1 | 1 |
28feb2016 | 5.0134 | 5 | A | USD | 0 | 1 | 0 |
30mar2016 | 0.13 | 5 | A | USD | 0 | 1 | 0 |
31apr2016 | -30.1 | 5 | A | USD | 0 | 1 | 0 |
Now, this duplication is well-known in this database and I aim to use the adjustment from another research: exclude a product when its returns has a correlation of 0.99 or more with the returns of another product. I reckon with about 3000 unique ProductIDs and 2500 CompanyIDs, the most efficient way [read: not blow up my pc] is to calculate correlations of products if one company reports more than one product. In the case of CompanyID A, run a STATA command that calculates the correlation between the returns of:
ProductID 1 and ProductID 3
ProductID 1 and ProductID 5
ProductID 3 and ProductID 5
I am stuck on this issue as I cannot find other posts that solve this issue and my source that cleans this database does not provide a .do file (if STATA was even used). So here comes my question that I have been grinding for the last hours: With what STATA command, or code, do I calculate the correlations between all products by CompanyID?
0 Response to Correlation of one variable between players in panel data
Post a Comment