I have a question regarding my dataset. It looks like this (a simplified version, I am not allowed to use the real data):
Fund | Return | Investment Year | Dummy |
1 | 1.5 | 2002 | 1 |
2 | 1.3 | 2002 | 0 |
3 | 1.4 | 2002 | 0 |
4 | 1.2 | 2003 | 0 |
5 | 1.8 | 2003 | 0 |
6 | 1.9 | 2003 | 0 |
7 | 1.2 | 2003 | 1 |
8 | 0.9 | 2004 | 0 |
9 | 0.7 | 2004 | 1 |
10 | 0.4 | 2004 | 0 |
I am interested in the effect of the Dummy on the return. However the Investment Year has an effect on return as well (older years have a higher return). I used a t-test to see if there is a difference in investment year for each group. Here I found that the group with the dummy 0 has on average older investment years and the funds of dummy 1 are relatively younger. However in that case it could be that the higher average return for the dummy 0 is a result of having more funds in older years. Therefore I wanted to put a weight on the amount of funds of dummy 0 in a certain investment year (for example 2002) in a way that the percentage of funds of the dummy variable 0 for year 2002 is the same as the percentage of funds of the dummy variable 1 for year 2002 from their total observations.
An example for if it is unclear what I mean:
For the year 1997: The dummy 0 has 3.4% of its total funds in this year invested. The dummy 1 has 2.8% of its total funds in this year invested.
For the year 1998: 4.1% for dummy 0 vs 0.9% for dummy 1.
So, somehow I want to get the following:
For the year 1997: Both dummy's have 2.8% of their total funds invested in year 1997.
For the year 1998: Both dummy's have 0.9% of their total funds invested in year 1997.
I do not want to delete observations, since that could change the descriptive statistics of the return of the selected year. Therefore I need to add weights on the observations of dummy 0. I hope someone can help me!
0 Response to Panel data: how to weight observations for each year for different dummy categories.
Post a Comment