I have panel data in which the education variable has missing values. I must use the surrounding non-missing values to fill in the missing ones.
The education variable is categorical with values 1 to 3, with a higher value indicating better education (1 = low education, 2 = moderate education, 3 = high education).
Hence, when looking at education over time for each individual, the value at time t+1 must be equal to or higher than the value at time t.
Here is example data. The variable educ is my current data, and I need to obtain the variable neweduc.
In this dataset only 1 or 2 consecutive observations are missing, but in my actual dataset this differs widely, with about 20 non-missing observations in a row at max. Hence, I don't think using subscripts [_n+-t] is useful. Could anyone help me solve this? Thank you in advance (:
Code:
clear all
input id time educ neweduc
1 1 1 1
1 2 2 2
1 3 . 2
1 4 2 2
1 5 . .
2 1 2 2
2 2 . 2
2 3 . 2
2 4 2 2
2 5 3 3
3 1 . 1
3 2 . 1
3 3 1 1
3 4 . 1
3 5 1 1
4 1 2 2
4 2 3 3
4 3 . 3
4 4 . 3
4 5 3 3
end
list, sepby(id)
+----------------------------+
| id time educ neweduc |
|----------------------------|
1. | 1 1 1 1 |
2. | 1 2 2 2 |
3. | 1 3 . 2 |
4. | 1 4 2 2 |
5. | 1 5 . . |
|----------------------------|
6. | 2 1 2 2 |
7. | 2 2 . 2 |
8. | 2 3 . 2 |
9. | 2 4 2 2 |
10. | 2 5 3 3 |
|----------------------------|
11. | 3 1 . 1 |
12. | 3 2 . 1 |
13. | 3 3 1 1 |
14. | 3 4 . 1 |
15. | 3 5 1 1 |
|----------------------------|
16. | 4 1 2 2 |
17. | 4 2 3 3 |
18. | 4 3 . 3 |
19. | 4 4 . 3 |
20. | 4 5 3 3 |
+----------------------------+Thus, per individual the following needs to happen.
- All observations before the last time educ = 1 need to be educ = 1 as well (can be seen for individual 3: the last observation with educ = 1 is at time = 5, so the observations at time=1,..,4 need to be educ = 1)
- All observations between the first and last observation with educ = 2 need to be educ = 2 as well (can be seen for individual 1 and 2: all observations between the first time educ = 2 and the last time educ = 2 should be educ = 2 as well)
- All observations after the first time educ = 3 need to be educ = 3 as well (can be seen for individual 4: at time = 2 is the first observation of educ = 3, which means that education must be educ = 3 for time = 2,...5)
0 Response to filling in missing values for panel data
Post a Comment