I'm running a cross-sectional time-series on individual responses in the November Supplement to the Current Population Survey (US Census/Dept of Labor). The November Supplement, in even-numbered years, asks a few questions about voter registration and voting.
My dependent variable is registered ("are you registered to vote?" 0=no; 1=yes). The individual-level (or household-level) variables are categorical and fairly standard in the extensive literature: gender, age groups, last level of schooling, income quartile, race/ethnicity, and time at current address. Other than gender, the rest have four to six categories. The data set contains all 50 states plus DC (treated as a state) and several years. The sample size is>470,000 adult citizens. I'm using xtreg even with a binary dependent variable because registration rates are about 60-80%, depending on sub-population. So, I'm taking the Angrist and Pischke attitude that OLS is no worse than logistic. Plus, running - xtlogit - and then requesting - margins- even on a strong, multicore computer takes hours (i.e., overnight) with the large sample size.
However, I wish to interact most of these level-one variables with one or more of three state-level policy variables (also binary). Year and state are included in -xtreg- in the usual way:
Code:
xtreg registered i.year x1-x6 policy1 policy2 policy3, i(stateid) cluster(stateid) fe
What are my options? Here are those I thought of:
1. Reduce categories in the variables? I would rather not drop or condense any as the goal is to get at the impact of policy on demographic groups. I could convert education and income into a scale using - alpha -, but that is hard to explain to a lay audience. The effect of the policies on age seems to hit those in the middle range, so I'd prefer not to use a continuous variable.
2. Run separate models with only a few batches of interactions at a time? It seems the policies do interact with nearly all the variables (but not all categories).
3. Determine if the St. Errors are trustworthy even without the F Test "working"? If I run the model as - reg, robust- with state ids included as an indicator, the change in p-values is between .01 and .1 for most interactions. And the F(95,472633)=780.86. Is that reason to run the model as - xtreg, cluster - and rely on the reported clustered st. errors?
Are there other solutions or things to be aware of?
Thank you.
0 Response to failure to get F test when running xtreg with clustering on a group (more predictors than groups)
Post a Comment