Hi stata users,
I face a problem I try to find an "easy" way to merge two datasets, one having a variable that varies over time and another one with time period.
I have an unbalanced panel
I will illustrate :
dataset1
Id activity begin_date end_date
A a1 1975m1 2001m5
A a2 2001m6 2099m12
B a1 1975m1 2013m5
C a2 1999m7 2001m6
C a3 2001m7 2012m12
C a4 2013m1 2015m11
dataset2

Id period expenses
A 2000m12 100
A 2001m1 200
A 2001m2 300
A 2001m3 350
A 2001m4 400
A 2001m5 1000
A 2001m6 1500
A 2001m7 1800
B 2001m1 200
B 2001m2 900
... ... ...
C 2001m5 200
C 2001m6 250
C 2001m7 300
...
C 2012m11 800
C 2012m12 900
C 2013m1 700
C 2013m2 500
C 2013m3 400
C 2013m4 120
and I want in the end to have ... Result wanted
Id period expenses activity
A 2000m12 100 a1
A 2001m1 200 a1
A 2001m2 300 a1
A 2001m3 350 a1
A 2001m4 400 a1
A 2001m5 1000 a1
A 2001m6 1500 a2
A 2001m7 1800 a2
B 2001m1 200 a1
B 2001m2 900 a1
... ... ... ...
C 2001m5 200 a2
C 2001m6 250 a2
C 2001m7 300 a3
... ... ... ...
C 2012m11 800 a3
C 2012m12 900 a3
C 2013m1 700 a4
C 2013m2 500 a4
C 2013m3 400 a4
C 2013m4 120 a4
I know that with sql we can say that period must be between begin_date and end_date .. but I could not find a trick to do this in stata.
The only way I found was to create 3fake lines" on dataset1 with
Code:
tsfill
and
Code:
carryforward
to have one observation each month-year but it duplicates to 60 Mio observations on my datasets and goes to 2Go ...
Thanks in advance for your time and proposal.