How to correctly residualize before running a regression

I am interested in how the performance of a student i on a test changes over the course of the test (more precisely, as a function of the position of the question in the test). I have multiple observations for every individual i (one observation per question in the test). My data looks like this:

Array
where the first column identifies the individual taking the test, the second column is the position of the question in the test, the third column is my measure of performance, the third column a dummy that takes the value 1 if the question was answered correctly and 0 otherwise, and the fourth column identifies the question in each position of the test. This is because each test-taker receives 1 of 4 possible booklets, and the order of the questions is different across booklets.

An example dataset is:

Code:

clear all
input id pos corr item
1 1 1 1
1 2 1 2
1 3 0 3

2 1 0 2
2 2 1 1
2 3 1 3

3 1 1 3
3 2 0 2
3 3 1 1
end

I am interested in whether as time goes by, individuals get tired and make more mistakes in the test. I want an estimate of the change in performance for each individual over the course of the test. So, one option is to run a regression per individual, like this:

C_q = α + β Pos_q + ε_q

where Pos_q ∈ {1, 2, ...} is the position of question q in the test. Each regression (one per student) would yield a β. Using the dataset above, that would be:

Code:

reg corr pos if id == 1

An equivalent way of estimating β for each individual is to run a pooled regression using all individuals and including individual-level fixed effects, like this:

Code:

reg corr c.pos#id i.id

The coefficient on id#c.pos 1 is -0.5, just like in the first regression.

Now, suppose that I want to do the same exercise, but controlling for the question each individual is answering (column 4 of the table). Including question fixed effects is straightforward in the pooled regression. I can simply run:

Code:

areg corr c.pos#id i.id, absorb(item)

In this case, the coefficient for the first individual is now -0.25.

For computational reasons, I can't run this pooled regression. Instead, I need to run one regression per person. So I tried to "partial-out" the question fixed effects, and then to run the regression at the student level. More precisely, what I tried is:

Code:

areg pos, absorb(item)
predict pos_hat1
gen pos_residual = pos - pos_hat1

reg corr pos_residual if id == 1

I thought this was going to work. However, comparing the coefficient from the "partial-out" approach (-0.25) with the coefficient from the pooled regression (-0.5) for a given student yields different results. So I must be doing something wrong! Any thoughts?

Thanks!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / How to correctly residualize before running a regression
How to correctly residualize before running a regression

0 Response to How to correctly residualize before running a regression

Post a Comment

Home / Data Cleaning / Data management / Data Processing / How to correctly residualize before running a regression How to correctly residualize before running a regression

Related Posts with How to correctly residualize before running a regression

0 Response to How to correctly residualize before running a regression

Post a Comment

Home / Data Cleaning / Data management / Data Processing / How to correctly residualize before running a regression
How to correctly residualize before running a regression