Hi,

I have a question regarding my dataset. It looks like this (a simplified version, I am not allowed to use the real data):
Fund Return Investment Year Dummy
1 1.5 2002 1
2 1.3 2002 0
3 1.4 2002 0
4 1.2 2003 0
5 1.8 2003 0
6 1.9 2003 0
7 1.2 2003 1
8 0.9 2004 0
9 0.7 2004 1
10 0.4 2004 0
It should be noted that the total observations of the dummy 0 is almost 40x larger than the observations for dummy 1.

I am interested in the effect of the Dummy on the return. However the Investment Year has an effect on return as well (older years have a higher return). I used a t-test to see if there is a difference in investment year for each group. Here I found that the group with the dummy 0 has on average older investment years and the funds of dummy 1 are relatively younger. However in that case it could be that the higher average return for the dummy 0 is a result of having more funds in older years. Therefore I wanted to put a weight on the amount of funds of dummy 0 in a certain investment year (for example 2002) in a way that the percentage of funds of the dummy variable 0 for year 2002 is the same as the percentage of funds of the dummy variable 1 for year 2002 from their total observations.

An example for if it is unclear what I mean:
For the year 1997: The dummy 0 has 3.4% of its total funds in this year invested. The dummy 1 has 2.8% of its total funds in this year invested.
For the year 1998: 4.1% for dummy 0 vs 0.9% for dummy 1.
So, somehow I want to get the following:
For the year 1997: Both dummy's have 2.8% of their total funds invested in year 1997.
For the year 1998: Both dummy's have 0.9% of their total funds invested in year 1997.
I do not want to delete observations, since that could change the descriptive statistics of the return of the selected year. Therefore I need to add weights on the observations of dummy 0. I hope someone can help me!