Principal Component Analysis with 3000 variables

Hi

My dataset consists of daily Google search volume indices (SVI) for around 3000 firms from 2004 to 2017. Thus, each firm is a time series of SVI having around 5000 daily observations. I want to check whether there is a correlation in the SVI for these 3000 firms. Thus, I need to do a Principal Component Analysis to analyze the presence of any commonality in SVI.

My data is currently in the form of a panel where firm id is the panel variable and daily date is the time variable. Thus, there are around 3000 panels with approximately 5000 daily observations each.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(firm_id date) byte svi
1 16071  3
1 16072 62
1 16073 53
1 16074 54
1 16075 65
1 16076 78
1 16077 88
1 16078 76
1 16079 73
1 16080 48
1 16081 56
1 16082 86
1 16083 79
1 16084 85
1 16085 95
1 16086 73
1 16087 69
1 16088 67
1 16089 78
1 16090 89
end
format %td date
label var firm_id "Firm Identifier" 
label var date "Daily Date" 
label var svi "Google SVI"

I know that I have to reshape the data so that each firm becomes a separate variable. By doing this, I will have 3000 variables (one for each firms's SVI observations; and of course the daily date variable) and the panel variable (i.e. Firm Identifier) will not exist anymore.

My question is whether Stata can handle a PCA of 3000 variables? I am using a Stata/SE 12.1 version in a machine with RAM of 8 GB. How can such a large covariance matrix be handled?

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Principal Component Analysis with 3000 variables
Principal Component Analysis with 3000 variables

0 Response to Principal Component Analysis with 3000 variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Principal Component Analysis with 3000 variables Principal Component Analysis with 3000 variables

Related Posts with Principal Component Analysis with 3000 variables

0 Response to Principal Component Analysis with 3000 variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Principal Component Analysis with 3000 variables
Principal Component Analysis with 3000 variables