Hello. I am currently working with a dataset of 2,949 observations that correspond to individuals surveyed across 96 communities over 5 years. While the same 96 communities were followed over 5 years, not the same individuals were necessarily interviewed each year: 60% of individuals are actually only observed once, 30% twice, and none is observed over the 5 years.

I could not find though a specific model or recommendation to follow in this sort of “hybrid” situation (i.e. a repeated cross-section of communities with very few individuals repeated over time or an extremely unbalanced panel of individuals):
  1. Proceed as a repeated cross-section: Pooled OLS with robust standard errors (clustered by individual or clustered by community-year);
  2. Proceed as a panel data model: Random effects model with robust standard errors (clustered by individual or clustered by community);
I would appreciate if anyone has an advice on how to proceed in these situations. I am also considering constructing a pseudo panel (by cohort ages and gender) as a robustness check, although I will end up with about 100 observations.

Thank you.