My dataset consists of daily Google search volume indices (SVI) for around 3000 firms from 2004 to 2017. Thus, each firm is a time series of SVI having around 5000 daily observations. I want to check whether there is a correlation in the SVI for these 3000 firms. Thus, I need to do a Principal Component Analysis to analyze the presence of any commonality in SVI.
My data is currently in the form of a panel where firm id is the panel variable and daily date is the time variable. Thus, there are around 3000 panels with approximately 5000 daily observations each.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(firm_id date) byte svi 1 16071 3 1 16072 62 1 16073 53 1 16074 54 1 16075 65 1 16076 78 1 16077 88 1 16078 76 1 16079 73 1 16080 48 1 16081 56 1 16082 86 1 16083 79 1 16084 85 1 16085 95 1 16086 73 1 16087 69 1 16088 67 1 16089 78 1 16090 89 end format %td date label var firm_id "Firm Identifier" label var date "Daily Date" label var svi "Google SVI"
My question is whether Stata can handle a PCA of 3000 variables? I am using a Stata/SE 12.1 version in a machine with RAM of 8 GB. How can such a large covariance matrix be handled?
0 Response to Principal Component Analysis with 3000 variables
Post a Comment