This question investigates whether restricting youth access to alcohol has impacts on motor vehicle death rates for young people. We restrict attention on death rates of those 18-20 (the age group impacted MLDA). The key variable is `legal1820`, indicating the fraction of 18-20 year olds in a state that can buy alcohol legal. This will be 1 if the MLDA is 18, and 0 if it is 21 for an entire year. For states that changed mid-way through the year, the variable is scaled. Many States had MLDA ages between this range. We exploit the over-time, within-state variation in an difference-in-difference design.

## Difference in Difference

Since the data is a panel on states that vary the drinking age limit, a difference-in-differences strategy to estimate the effect on drinking age limits on death rates seems natural here.

Long data is great for figures, but doesn't always work for tables or regressions. Thus, I covert it to wide form here. This creates separate variables for each of the death causes. The main dependent variable with be `MVA`, deaths from moter vehicals.
```{r}
df <- df %>%
ungroup() %>%
pivot_wider(names_from = dtype, id_cols = c(state, year,pop, legal1820, legal, beertaxa, beerpercap, winepercap, spiritpercap, totpercap), values_from = mrate) %>%
rename(other_external = `other external`) %>%
group_by(state) %>%
mutate(treat = ifelse(first(legal1820) != legal1820[year == 1979],1,0))

```

1. I have created a variable called `treat` that is equal to 1 for states that responded to the 1971 constitution change and 0 otherwise. Add two more variables: `post` if the year is `>=1975` and an interaction between the variable `post` and `treat`.

```{r}
gen treat = 1 if(year=1971)
gen post if(year >= 1975)
gen interaction = post*treat
```
(I do not have idea about this)

2. Run a simple difference in difference regression where the dependent variable is `MVA` and the right hand side has `post`, `treat`, and the interaction term you created above. Interpret your result: do states that lower their drinking age have more motor vehicle deaths?

```{r}

```

3. The simple difference in difference above doesn't use all the information available. Instead of putting a dummy variable `treat`, we could include state fixed effects. Likewise, instead of a dummy variable `post` we could include year fixed effects. The variable `legal1820` varies across states and over time, so it will more efficiently use the data compared to a post-treatment dummy. Using the data frame `df` add two new variables. A `factor` variable called year using `year = as.factor(year)` inside mutate and similarly for state, `state = as.factor(state)`. These can now be added easily to a regression as categorical variables. Run a difference-in-difference regression of `MVA` on `legal1820` and state and year fixed effects. Save this as `mod1`

3. Repeat the above regression, but weight it by the variable `pop` using the weight option: `lm( y ~ x, data = df, weight = pop)`. Save this as `mod2`

4. Repeat your above two regressions (with and without weights) usign the control variables `beertaxa`, `beerpercap`, `winepercap`, `spiritpercap`, `totpercap`. Save these as `mod3` and `mod4`.

5. Output your regression results using `stargazer`, but only keep the variable `legal1820` using the `stargazer` option `keep`. Interpret the output from your table.

6. Repeat the above steps, but use the dependent variable `internal`. This is death from internal causes, and thought to be unrelated to alcohol consumption. Thus, it serves as a **falsification** test. We should not find that drinking laws are correlated to internal death cause rates.

I am a beginner of regression, these problems are very difficult for me, and I do not know where to find the basic code, if you can give me some advices and I would appreciate it.