I have panel data in which the education variable has missing values. I must use the surrounding non-missing values to fill in the missing ones.
The education variable is categorical with values 1 to 3, with a higher value indicating better education (1 = low education, 2 = moderate education, 3 = high education).
Hence, when looking at education over time for each individual, the value at time t+1 must be equal to or higher than the value at time t.
Here is example data. The variable educ is my current data, and I need to obtain the variable neweduc.
In this dataset only 1 or 2 consecutive observations are missing, but in my actual dataset this differs widely, with about 20 non-missing observations in a row at max. Hence, I don't think using subscripts [_n+-t] is useful. Could anyone help me solve this? Thank you in advance (:
Code:
clear all input id time educ neweduc 1 1 1 1 1 2 2 2 1 3 . 2 1 4 2 2 1 5 . . 2 1 2 2 2 2 . 2 2 3 . 2 2 4 2 2 2 5 3 3 3 1 . 1 3 2 . 1 3 3 1 1 3 4 . 1 3 5 1 1 4 1 2 2 4 2 3 3 4 3 . 3 4 4 . 3 4 5 3 3 end list, sepby(id) +----------------------------+ | id time educ neweduc | |----------------------------| 1. | 1 1 1 1 | 2. | 1 2 2 2 | 3. | 1 3 . 2 | 4. | 1 4 2 2 | 5. | 1 5 . . | |----------------------------| 6. | 2 1 2 2 | 7. | 2 2 . 2 | 8. | 2 3 . 2 | 9. | 2 4 2 2 | 10. | 2 5 3 3 | |----------------------------| 11. | 3 1 . 1 | 12. | 3 2 . 1 | 13. | 3 3 1 1 | 14. | 3 4 . 1 | 15. | 3 5 1 1 | |----------------------------| 16. | 4 1 2 2 | 17. | 4 2 3 3 | 18. | 4 3 . 3 | 19. | 4 4 . 3 | 20. | 4 5 3 3 | +----------------------------+
Thus, per individual the following needs to happen.
- All observations before the last time educ = 1 need to be educ = 1 as well (can be seen for individual 3: the last observation with educ = 1 is at time = 5, so the observations at time=1,..,4 need to be educ = 1)
- All observations between the first and last observation with educ = 2 need to be educ = 2 as well (can be seen for individual 1 and 2: all observations between the first time educ = 2 and the last time educ = 2 should be educ = 2 as well)
- All observations after the first time educ = 3 need to be educ = 3 as well (can be seen for individual 4: at time = 2 is the first observation of educ = 3, which means that education must be educ = 3 for time = 2,...5)
0 Response to filling in missing values for panel data
Post a Comment