Hi all,

I am conducting a study to estimate the effect of Medicaid expansion on the uninsured rate using a classic Difference-in-Differences (DID) design with two-way fixed effects (twfe) model. My mathematical model is as follows:

UNINSist = αs + δt + βEXPANSIONist + εist

In this model:


UNINSist is a binary variable indicating whether an individual in the survey is uninsured (1) or insured (0) in state s and year t.
αs represents state fixed effects, capturing time-invariant differences across states.
δt represents time fixed effects, capturing common time trends across all states.
β is the parameter of interest, representing the causal effect of Medicaid expansion on the uninsured rate.
EXPANSIONist is a binary treatment variable that equals 1 for states that adopted Medicaid expansion and 0 for states that did not.
εist is the error term accounting for unobserved factors and random variation.

I have data from the American Community Survey (ACS) for the years 2011 to 2019, which consists of repeated cross-sectional data. Here are the top 15 observations of my dataset:

Array


To estimate this model, I am using the reghdfe command

Code:
reghdfe UNINS expansion , absorb(ST YEAR) cluster(ST)
eventhough I got the regression result I got the following error
Code:
  
 note: expansion is probably collinear with the fixed effects (all partialled-out values are close to z > ero; tol = 1.0e-09) (MWFE estimator converged in 4 iterations) note: expansion omitted because of collinearity
I tired using xtreg command in Stata instead but encountered a challenge. Since my data is in a repeated cross-sectional format, the xtreg command requires me to define the panel structure using xtset ST YEAR.

To proceed with the xtreg command, I would need to aggregate the individual observations and take the mean uninsured for each state and year. This would transform my repeated cross-sectional data into a panel structure. However, I have a few concerns regarding this approach.

Firstly, my dataset includes several demographic variables such as sex, race, and education level, which are categorical variables. Aggregating the data by taking the mean may not be appropriate for categorical variables, as it could lead to the loss of valuable information. I am unsure how to handle these categorical variables effectively while converting the data to a panel structure.

Secondly, my dataset also includes survey weights. Considering that the survey weights are specific to each individual, taking the mean uninsured rate for each state and year may not accurately account for the survey design and could potentially introduce biases into the analysis.

Given these concerns, I am uncertain whether taking the average of individuals to obtain one observation per year per state is a suitable approach for my analysis. And also I don't know if taking this approach would solve my treatment collinearity with the fixed effect.


I am seeking guidance on how to address this issue and estimate the classic DID TWFE model.


Thank you for your assistance!

I'm using Stata 17