Independent CurrentExpensePerADA |
Expenditure per average day of attendance |
CurrentExpensePerADA_Sqr | CurrentExpensePerADA^2 |
CharterSchool | =1 if charter, =0 if public |
Rural | =1 if rural, =0 if other |
Town | =1 if town, = 0 if other |
Suburb | =1 if suburb, =0 if other |
Urban | =1 if urban, =0 if other |
CharterRural | interaction effect CharterSchool*Rural |
CharterTown | interaction effect CharterSchool*Town |
CharterSuburb | interaction effect CharterSchool*Suburb |
lnTotalStudentsAllGrades | log transformed total student population |
FreeandReduced | % of Free or Reduced lunch students at school |
AsianMajority | =1 if >30% Asian, =0 if <30% Asian |
HispanicMajority | =1 if >50% Hispanic, =0 if <50% Hispanic |
BlackMajority | =1 if >30% Black, =0 if <30% Black |
WhiteMajority | =1 if >50% White, =0 if <50% White |
CharterAsian | interaction effect CharterSchool*AsianMajority |
CharterHispanic | interaction effect CharterSchool*HispanicMajority |
CharterBlack | interaction effect CharterSchool*BlackMajority |
CharterWhite | interaction effect CharterSchool*WhiteMajority |
lnPupilTeacherRatio | log transformed pupils per teacher ratio |
Dependent APIBaseScore |
California scores 200-1000 (mainly test scores) |
Previously, I had race demographics entered as percentages, but made them binary because of non-linearity problems. I log transformed my non-binary variables TotalStudents and PupilTeacherRatio and that fixed my linearity issue, according to the scatterplot I ran. Luckily FreeandReduced was already linear with API, because log transform wouldn't work with so many 0 values in this variable. Does this data set structure make sense, and is it okay that most my right-hand side variables are binary? Here is a OLS regression I ran for just the year 2010:
reg Y_Dep X_Ind, robust |
Another question: Because my variable of interest CharterSchool is time invariant I ran random effects so it doesn't drop out. I am hoping I can find something more interesting using panel data from years 2006-2009, because with cross-section there's too many limitations in the model (i.e. self selection bias) to determine causality. I am quite inexperienced working with panel data, but here is the model I ran. Is this sufficient given my set of variables?
Note: I sorted it xtset School Year
xtrex Y_Dep X_Ind i.Year, vce(cluseter School) |
I know that my model has several limitations, but I was hoping to capture some interesting things with my interaction effects. I also included panel data to try and make my results more convincing. Is my direction sensible? I feel like there might be a fatal flaw with my methods or I'm not on the right path, but maybe I've been staring at this for too long. Thank you so much in advance for your help. I hope this isn't too long or my questions aren't too broad.
0 Response to Trouble working with panel data and including too many dummy variables
Post a Comment