Hi all, I am researching the causality of the plain package on the proportion of smokers with Diff-in-Diff estimation technique using Uk and Spanish data. (the UK is treatment and Spanish is control data) Due to limited data, I have data for 2009,2011,2012,2014 and 2017 for both datasets.

To estimate using Diff-in-Diff, I have constructed,

Code:
reg p_csmoker time treatment pp Sex age_gr socialc hqual dnnow incomelv cigtax i.year
However with this regression, treatment, pp, 2014.year (2014 dummy) and 2017.year are omitted due to collinearity.

I have seen previous posts about collinearity and concluded that you have to drop variables to overcome this collinearity issue, but I cannot drop treatment and pp as it's my main independent variable.

Here is the definition of each variable;

p_csmoker (independent variable): a yearly measure of the proportion of smokers.

time (dummy variable): 1 if 2017(after the plain package introduced) and 0 otherwise.

treatment (dummy variable): 1 if UK resident and 0 if Spanish

pp (interaction variable): time*treatment which is essentially what I am looking for.

Sex,dnnow and i.year (control dummy): Sex, whether participants drink nowadays and a dummy for each year.

age_gr, socialc,hqual and incomelv (categorical control variables): age group, social class, highest educational qualifications and income level

cigtax (continuos control variable): cigarette tax (yearly measure)

I am sorry for being bit wordy here but as I cannot post my dataset, this is best I can do...

If there is any additional info that may require to solve, please let me know!

Thank you in advance!