Dears,
I have a question about model specification that I wonder if people can share their thoughts about it (by the way, apologies if this is not the place to ask this question, in that case I'll appreciate if someone can suggest me a more appropriate forum). I am estimating a difference and difference analysis using a panel data of schools. The treatment is the construction of water and sanitation project in rural districts. I have data from 2012 to 2016. However the issue with my setting is that, each school could receive treatment at different time periods and they can receive treatment many times (it is possible that many water projects could be constructed in the same district). My baseline regression is this

Y,ist = alfa,t + alfa,s + alfa,i + beta*Ds,t + epsilon,ist

Where "s" denote district , "i" denota individual and "t" year. The variable Ds,t equals one if there have been finished at least one project in district "s" in year "t"

I would like to do a falsification test to show that my results are robust. However, I wonder if I can use lead values of the treatments as long as once they have received the treatment the project remains and later on they could benefit from the project. I also have get data from 2008 to 2011. Should I estimate a regression using only data from this period and using the lead of the treatment? or I should consider the whole sample data and the leads of the treatment?.
Well, any comments about this will be appreciated!
Thanks,
Diego