Hello,
I am writing an Empirical Paper on the effect of alcohol on labor market outcomes. The data is from the National Health Interview Study (and is named nhis_alcohol.dta).
I know that the data is Pooled-Cross Sectional data. However, I am not sure how to include the years into my regression. The dataset is from 2012-2016 so I cannot simply make a single dummy variable to account for the different years. Would I have to create multiple dummy variables and add them each into my regression?

On top of that when I create a scatter plot for earnings on binge (annual earnings and number of days a person binge drank the last year) I do not get a clear graph (possibly that the data points themselves are too big in circumference?).

Thank you for the help!