Dear Statalist,

I am interested in implement a diff-in-diff (DID) analysis for the following longitudinal type of dataset (see first dataset below). I have read that for implementing a DID it is necessary to have equally spaced data (like for instance in the panel data structure). However, I am wondering if with this longitudinal data structure, it is possible to do it (just for not lose obs. and estimate a more efficient model).
My aim is to estimate how good it is for some workers if they have taken a specific learning program, as in the opposite case for the rest of workers. Since I could not find the way of doing a propensity score matching for other than cross-sectional data, I think DID is the way to go.

Several things to notice:

1) Some individuals might enter the sample in different years (this is why I am not sure if the variable occasion is well constructed -> occasion 1 for id_person 1 is not the same period as occasion 1 for id_person 4). In fact, each occasion is a record for each worker. So, I am not sure if doing “xtset id_person occasion” is correct.

2) Is it possible to use fixed effects for individuals in the estimation for controlling for unobserved heterogeneity as in a panel data?

3) Since the treatment take place in different years, I do not know how to implement the DID methodology. I recently read the paper: Designing Difference in Difference Studies: Best Practices for Public Health Policy Research; where they basically stress that when dealing with more than two periods, the usual way to go in DID does not apply. Instead, they advise to use a regression like: Y_it = a_i + b_t + B*Treatment_it + e_it (where “a_i” means person FE, and “b_t” means time FE).

4) Looking at several examples of multilevel models, I have seen that they build an “occasion” variable in the same way it is here. However, their case is in the context of students nested within schools. So, maybe for them occasion 1 for a given student 1 could be the same occasion 1 for another student 4 (not sure about it) which is not my case.

5) Can a DID estimation be implemented with Random effects instead of fixed effects?

As you can see, my main problem is how to deal here with a possible “time” variable. Since I mainly have worked with panel data, it is hard for me to know how time should be treated here, specifically for the DID estimation. Should I leave it as it is now ("occasion" in the example 1 below), or should I build a panel data (example 2 below)? In any case, how to estimate the DID?

Any hint or advice will be much appreciated.

Example 1 (longitudinal structure):
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id_person float occasion byte(firm_id start_day start_month) int start_year byte(end_day end_month) int end_year byte(x1 x3 x5 Treat)
1  1  1 18 11 2005 18 11 2005 0 0 32 0
1  2  1 16 12 2005 16 12 2005 0 0 32 0
1  3  1 12  1 2006 12  1 2006 0 0 32 0
1  4  1  1  2 2006  1  2 2006 0 0 32 0
1  5  1 21  3 2006 21  3 2006 0 0 32 0
1  6  1 17  1 2007 17  1 2007 0 0 32 0
1  7  1  8  2 2007  8  2 2007 0 0 32 0
1  8  1 14  2 2007 15  2 2007 0 0 32 0
1  9  2 25  6 2008 24  9 2008 0 0 54 0
1 10  3  2  7 2009  1 10 2009 0 1 33 1
1 11  4 30  7 2009  .  .    . 0 0 55 1
1 12  5 15  3 2010  .  .    . 0 0 32 1
1 13  6 11  5 2010 31  8 2010 0 0 33 1
1 14  7  2 11 2010 24 12 2010 0 0 23 1
1 15  7 21  1 2011 23  4 2011 0 0 23 1
1 16  7 26  4 2011  8  5 2011 0 0 33 1
1 17  8  2  5 2011 30  9 2011 0 0 23 1
2  1  9 31 12 2006  1  1 2007 0 1 11 0
2  2 10 20  4 2007 22  4 2007 0 0 32 0
2  3 10  5  5 2007  6  5 2007 0 0 80 0
2  4 10 11  5 2007 27  5 2007 0 1 32 0
2  5 11 30  4 2008 12 10 2008 0 0 33 0
2  6 12 19 12 2008  .  .    . 0 1 32 0
2  7 13  5  5 2009 13  9 2009 0 0 54 0
2  8 14 10  5 2010 16  9 2010 0 0 54 0
2  9 15  9  3 2011  8  9 2011 0 0 23 1
3  1 16 28  7 2008  .  .    . 0 0 55 0
3  2 17  8  3 2010  .  .    . 0 0 55 0
3  3 17  1  4 2011  .  .    . 0 0 55 0
3  4 18  1  1 2014  .  .    . 0 0 54 1
3  5 19 14  1 2019 30  6 2019 0 1 55 1
3  6 20  1  6 2019  .  .    . 1 0 54 1
4  1 21  2  4 2000  5  5 2001 0 0 55 0
4  2 21  4 10 2001  .  .    . 0 0 55 0
4  3 21 20 11 2001  .  .    . 0 0 55 0
4  4 22 12 12 2001 13 12 2001 0 1 33 0
4  5 23 31  5 2002  .  .    . 1 0 48 0
end

Example 2 (panel data):
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id_person firm_id start_day start_month) int start_year byte(end_day end_month) int end_year float(newx1 newx3 newx4 newtreat occasion)
1  1 16 12 2005 16 12 2005 0 0        30 0  2
1  1  1  2 2006  1  2 2006 0 0  38.33333 0  4
1  1  8  2 2007  8  2 2007 0 0        32 0  7
1  2 25  6 2008 24  9 2008 0 0        25 0  9
1  3  2  7 2009  1 10 2009 0 1        28 1 10
1  6 11  5 2010 31  8 2010 0 0 36.666668 1 13
1  7 26  4 2011  8  5 2011 0 0  38.33333 1 16
2  9 31 12 2006  1  1 2007 0 1         5 0  1
2 10 11  5 2007 27  5 2007 0 1 34.666668 0  4
2 12 19 12 2008  .  .    . 0 1        30 0  6
2 13  5  5 2009 13  9 2009 0 0        40 0  7
2 14 10  5 2010 16  9 2010 0 0        20 0  8
2 15  9  3 2011  8  9 2011 0 0        20 1  9
3 16 28  7 2008  .  .    . 0 0        40 0  1
3 17  8  3 2010  .  .    . 0 0        40 0  2
3 17  1  4 2011  .  .    . 0 0        35 0  3
3 18  1  1 2014  .  .    . 0 0        40 1  4
3 19 14  1 2019 30  6 2019 1 1         7 1  5
4 21  2  4 2000  5  5 2001 0 0        25 0  1
4 21  4 10 2001  .  .    . 0 1 28.333334 0  2
4 23 31  5 2002  .  .    . 1 0        40 0  5
end