Hello everyone, this is my first post here and I hope to do everything well to have some help about my issue from you.
I am writing my MSc dissertation and I have a cross section dataset (ISTAT labour force survey from 2014 to 2020) and I am interested to asses the impact of a government regulation to the income of the Italian pharmacists.
I am using a difference in differences because I know for each region when it starts to be treated and so I created a variable that is 1 if the region in a specific year is treated and 0 otherwise. Therefore this is my variable of interest, because it tells me if the pharmacist in a specific year and in a specific region is treated or not.
Moreover, this regulation is not implemented in the same year in each region, in fact I have that in 2016 (the first year of implementation) 5 regions start opening new pharmacies, in the 2017 some of the others followed these regions and so on. In 2020 I have that every region is treated.
I am using a model in which the dependent variable is the logarithm of the income of the pharmacist i, T is my treatment variable (explained above), then I add year fixed effects and region fixed effects, then I add some covariates (X).
When I estimated for the first time this model I noticed that the coefficient of the treatment variable was not significant, therefore I tried some alternative regressions (adding one fixed effect each time) to find out that if I omit the first two year and the last one (2014, 2015 and 2020) the coefficient of interest become significant.
I have read many times that I have to include k-1 dummies of a categorical variable in my regression and so I am not able to find an explanation of these results. What do you suggest to do ?
It could be due to the fact that in the first two years the treatment dummy is 0 for every pharmacist and in 2020 is 1 for every pharmacist ?
Thanks in advance for your help.