Array
where the first column identifies the individual taking the test, the second column is the position of the question in the test, the third column is my measure of performance, the third column a dummy that takes the value 1 if the question was answered correctly and 0 otherwise, and the fourth column identifies the question in each position of the test. This is because each test-taker receives 1 of 4 possible booklets, and the order of the questions is different across booklets.
An example dataset is:
Code:
clear all input id pos corr item 1 1 1 1 1 2 1 2 1 3 0 3 2 1 0 2 2 2 1 1 2 3 1 3 3 1 1 3 3 2 0 2 3 3 1 1 end
C_q = α + β Pos_q + ε_q
where Pos_q ∈ {1, 2, ...} is the position of question q in the test. Each regression (one per student) would yield a β. Using the dataset above, that would be:
Code:
reg corr pos if id == 1
Code:
reg corr c.pos#id i.id
Now, suppose that I want to do the same exercise, but controlling for the question each individual is answering (column 4 of the table). Including question fixed effects is straightforward in the pooled regression. I can simply run:
Code:
areg corr c.pos#id i.id, absorb(item)
For computational reasons, I can't run this pooled regression. Instead, I need to run one regression per person. So I tried to "partial-out" the question fixed effects, and then to run the regression at the student level. More precisely, what I tried is:
Code:
areg pos, absorb(item) predict pos_hat1 gen pos_residual = pos - pos_hat1 reg corr pos_residual if id == 1
I thought this was going to work. However, comparing the coefficient from the "partial-out" approach (-0.25) with the coefficient from the pooled regression (-0.5) for a given student yields different results. So I must be doing something wrong! Any thoughts?
Thanks!
0 Response to How to correctly residualize before running a regression
Post a Comment