Hi there!

I am currently working on my dissertation studying how a school-based initiative affects students' mental health. However, I only get students' mental health data for three school districts over six years (4 pre-treatment and 2 post-treatment years ). The policy is adopted at the district level and two of the school districts adopt the initiative in the same year. I have about 1,500 observations per district-year (about 30,000 observations in total) and the data is repeated cross-sectional data.

I would like to use a difference-in-difference strategy. However, I am not sure which is the correct way to get the standard errors. Give only three clusters, it would be incorrect to use the robust-cluster standard error. One possible way is to cluster the standard error at the grade-district level so I will have 3X4=12 clusters (I do not have information on which schools the students were studied in). However, that is still a very small number of clusters. I was wondering if anyone knows how to deal with the standard errors in my situation? Maybe use wild bootstrap standard errors at grade-district level? Thanks.