I have a dataset on test scores taken in grade 8 by 10 cohorts (2010-2019) of around 500,000 students each, from around 5,000 schools. Therefore I have only one observation for each individual, but I observe the whole cohort of grade 8 of each school every year. So the dataset is a repeated cross-section of the population of grade 8 students in all Italian schools. The dataset looks something like this:
student_id | school_id | test_score | year | proportion_females |
1 | 1 | 100 | 2010 | 0.49 |
2 | 1 | 103 | 2010 | 0.49 |
1001 | 2 | 98 | 2010 | 0.52 |
1002 | 2 | 100 | 2010 | 0.52 |
... | ||||
500,001 | 1 | 102 | 2011 | 0.50 |
500,002 | 1 | 101 | 2011 | 0.50 |
501,001 | 2 | 97 | 2011 | 0.51 |
501,002 | 2 | 99 | 2011 | 0.51 |
... | ||||
1,000,001 | 1 | 98 | 2012 | 0.48 |
1,000,002 | 1 | 100 | 2012 | 0.48 |
1,001,001 | 2 | 101 | 2012 | 0.49 |
1,001,002 | 2 | 97 | 2012 | 0.49 |
I thought of running the following regression:
Code:
reghdfe test_score proportion_females, absorb (school_id year)
Code:
xi: reg test_score proportion_females i.school_id i.year
My questions are:
(0) Any general comments on this approach?
(1) Which regression command is suitable for this strategy? I know xtreg and reghdfe are used for panel regressions, and I am wondering whether my dataset can be considered a "school-panel" and therefore those commands would be ok.
(2) how to include school-specific time-trend? By adding c.year#school_id?
Thanks in advance,
Pietro
0 Response to (1) areg, xtreg, reghdfe or reg and (2) time-trend
Post a Comment