Trouble working with panel data and including too many dummy variables

I am analyzing public and charter high schools in the state of California. I am interested in Charter school performance compared to non charter schools, controlling for other things like geographic area, class size, demographics. These are my variables:

Independent CurrentExpensePerADA	Expenditure per average day of attendance
CurrentExpensePerADA_Sqr	CurrentExpensePerADA^2
CharterSchool	=1 if charter, =0 if public
Rural	=1 if rural, =0 if other
Town	=1 if town, = 0 if other
Suburb	=1 if suburb, =0 if other
Urban	=1 if urban, =0 if other
CharterRural	interaction effect CharterSchool*Rural
CharterTown	interaction effect CharterSchool*Town
CharterSuburb	interaction effect CharterSchool*Suburb
lnTotalStudentsAllGrades	log transformed total student population
FreeandReduced	% of Free or Reduced lunch students at school
AsianMajority	=1 if >30% Asian, =0 if <30% Asian
HispanicMajority	=1 if >50% Hispanic, =0 if <50% Hispanic
BlackMajority	=1 if >30% Black, =0 if <30% Black
WhiteMajority	=1 if >50% White, =0 if <50% White
CharterAsian	interaction effect CharterSchool*AsianMajority
CharterHispanic	interaction effect CharterSchool*HispanicMajority
CharterBlack	interaction effect CharterSchool*BlackMajority
CharterWhite	interaction effect CharterSchool*WhiteMajority
lnPupilTeacherRatio	log transformed pupils per teacher ratio
Dependent APIBaseScore	California scores 200-1000 (mainly test scores)

Previously, I had race demographics entered as percentages, but made them binary because of non-linearity problems. I log transformed my non-binary variables TotalStudents and PupilTeacherRatio and that fixed my linearity issue, according to the scatterplot I ran. Luckily FreeandReduced was already linear with API, because log transform wouldn't work with so many 0 values in this variable. Does this data set structure make sense, and is it okay that most my right-hand side variables are binary? Here is a OLS regression I ran for just the year 2010:

reg Y_Dep X_Ind, robust

Array

Another question: Because my variable of interest CharterSchool is time invariant I ran random effects so it doesn't drop out. I am hoping I can find something more interesting using panel data from years 2006-2009, because with cross-section there's too many limitations in the model (i.e. self selection bias) to determine causality. I am quite inexperienced working with panel data, but here is the model I ran. Is this sufficient given my set of variables?

Note: I sorted it xtset School Year

xtrex Y_Dep X_Ind i.Year, vce(cluseter School)

Array

I know that my model has several limitations, but I was hoping to capture some interesting things with my interaction effects. I also included panel data to try and make my results more convincing. Is my direction sensible? I feel like there might be a fatal flaw with my methods or I'm not on the right path, but maybe I've been staring at this for too long. Thank you so much in advance for your help. I hope this isn't too long or my questions aren't too broad.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Trouble working with panel data and including too many dummy variables
Trouble working with panel data and including too many dummy variables

0 Response to Trouble working with panel data and including too many dummy variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Trouble working with panel data and including too many dummy variables Trouble working with panel data and including too many dummy variables

Related Posts with Trouble working with panel data and including too many dummy variables

0 Response to Trouble working with panel data and including too many dummy variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Trouble working with panel data and including too many dummy variables
Trouble working with panel data and including too many dummy variables