In some ways, this is more of a model-building question than a Stata question, but it is related to Stata procedures, so I hope it is acceptable. (If not, let me know and I can take it to StackExchange).

My research question: What is the effect of (United States) state election policies on voter registration, both overall and for specific demographic groups?

I am using the Census Bureau’s Current Population Survey (CPS) November Supplement, which asks about voter registration in even-numbered years. The survey is commonly used in election studies due to its size (~80,000 adult citizens each November, distributed across all states and DC) and a large number of demographic variables. Also, the CPS apparently has less of a problem with inflated responses about registration and voting (social desirability bias) compared to other surveys.

I am using the individual-level observations (some researchers collapse the data to the state-year) in a logistic or OLS regression model (registration rates are about 70 to 80%). The dependent variable is “registered to vote” (binary, 1=yes) regressed on individual-level predictors, state-level policies and context, and state and year indicators. I’m clustering standard errors on the state, not the state-year (I think that is right).

The total number of observations is >400,000 with 45 states and seven election cycles (“years”). The data cover 2008 to 2020. (Excluding 2020 on COVID grounds does not matter and perhaps not necessary given the record registration rates in 2020.)

I usually run the model like this (Stata/MP 15.1):

Code:
xtreg registered i.year ($xvar)##($zvar), cluster(statefip) i(statefip) re

margins i.race#i.policy1
margins i.policy1@i.race, contrast (eff)
Where “statefip” is a state id, $xvar is a macro containing a set of categorical individual-level predictors (gender, race/ethnicity, etc.) and $zvar contains a few state-level predictors, including the policy of interest.

Unexpected result: The interaction of the policy with the indicator for black citizens results in a negative, highly statistically significant, and substantial in size (two to three times that for most other individual-level predictors) impact of the policy. This is contrary to the results for most other race/ethnicity groups and surprising on theoretical grounds.

Additional background:
  1. Registration rates for black citizens declined after 2012 (Obama’s last election) but rebounded in 2020 to the 2012 rate. (Rates for white citizens increased by more and are now higher than for black citizens.)
  2. Of possible importance: the policy of interest was first implemented in 2016 (except for one earlier state). By 2018, 12 states implemented the policy. By 2020, 18 states had.
Questions:
  1. To what degree should I expect the year indicators (and state effects?) to control for any year(s)-specific trends in race registration?
  2. Should I interact race with year?
  3. I assume I don't need to interact race X year X policy. Yes? Doing so produces many empty cells (see background item 2 above) which cause -margins- to not produce results ("not testable"/"not estimable").
Thank you!