I am working with panel data for funds and look for a solution to calculate standard errors (SEs) of a single variable (return) on a given day t. These SEs need to be clustered around the respective values for the cluster_variable (which refers to different investment styles in this case). I.e. I want the SEs only to be calcluated for all observations with the same cluster_variable on day t, and not for the whole sample on the day. As you can see, the cluster_variable is static over time for each fund.
Here is a short example.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(fund t) double return byte cluster_variable 1 1 .1 1 2 1 .2 1 3 1 .08 2 4 1 .9 2 5 1 .7 2 1 2 .4 1 2 2 .5 1 3 2 .03 2 4 2 .2 2 5 2 .4 2 end
I have contemplated to produce the SDs and then count the observations (obs) of each cluster variable to produce SEs, following SE = SD/sqrt(obs). So I started with: egen SD = sd(return) by (cluster_variable t) to generate the following.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(fund t) double return byte cluster_variable float SD 1 1 .1 1 .07071068 2 1 .2 1 .07071068 3 1 .08 2 .4275512 4 1 .9 2 .4275512 5 1 .7 2 .4275512 1 2 .4 1 .07071068 2 2 .5 1 .07071068 3 2 .03 2 .1852026 4 2 .2 2 .1852026 5 2 .4 2 .1852026 end
Can anyone provide a more elegant way to derive the desired SEs or provide help how to count the number of same cluster_variable observations on a given day t?
The counting result (obs) should look like this in a new variable:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(fund t) double return byte cluster_variable float(SD obs) 1 1 .1 1 .07071068 2 2 1 .2 1 .07071068 2 3 1 .08 2 .4275512 3 4 1 .9 2 .4275512 3 5 1 .7 2 .4275512 3 1 2 .4 1 .07071068 2 2 2 .5 1 .07071068 2 3 2 .03 2 .1852026 3 4 2 .2 2 .1852026 3 5 2 .4 2 .1852026 3 end
The data above is a simplified example. The real dataset has >1.000 funds and around 12 cluster variables.
Best,
Daniel
0 Response to Clustered standard errors for a single variable in panel data
Post a Comment