Hi there,

I have panel data in which the education variable has missing values. I must use the surrounding non-missing values to fill in the missing ones.

The education variable is categorical with values 1 to 3, with a higher value indicating better education (1 = low education, 2 = moderate education, 3 = high education).
Hence, when looking at education over time for each individual, the value at time t+1 must be equal to or higher than the value at time t.




Here is example data. The variable educ is my current data, and I need to obtain the variable neweduc.
In this dataset only 1 or 2 consecutive observations are missing, but in my actual dataset this differs widely, with about 20 non-missing observations in a row at max. Hence, I don't think using subscripts [_n+-t] is useful. Could anyone help me solve this? Thank you in advance (:

Code:
clear all
input id time educ neweduc
1 1 1 1
1 2 2 2
1 3 . 2
1 4 2 2
1 5 . .
2 1 2 2
2 2 . 2
2 3 . 2
2 4 2 2
2 5 3 3
3 1 . 1
3 2 . 1
3 3 1 1
3 4 . 1
3 5 1 1
4 1 2 2
4 2 3 3
4 3 . 3
4 4 . 3
4 5 3 3
end


list, sepby(id)

     +----------------------------+
     | id   time   educ   neweduc |
     |----------------------------|
  1. |  1      1      1         1 |
  2. |  1      2      2         2 |
  3. |  1      3      .         2 |
  4. |  1      4      2         2 |
  5. |  1      5      .         . |
     |----------------------------|
  6. |  2      1      2         2 |
  7. |  2      2      .         2 |
  8. |  2      3      .         2 |
  9. |  2      4      2         2 |
 10. |  2      5      3         3 |
     |----------------------------|
 11. |  3      1      .         1 |
 12. |  3      2      .         1 |
 13. |  3      3      1         1 |
 14. |  3      4      .         1 |
 15. |  3      5      1         1 |
     |----------------------------|
 16. |  4      1      2         2 |
 17. |  4      2      3         3 |
 18. |  4      3      .         3 |
 19. |  4      4      .         3 |
 20. |  4      5      3         3 |
     +----------------------------+

Thus, per individual the following needs to happen.

- All observations before the last time educ = 1 need to be educ = 1 as well (can be seen for individual 3: the last observation with educ = 1 is at time = 5, so the observations at time=1,..,4 need to be educ = 1)

- All observations between the first and last observation with educ = 2 need to be educ = 2 as well (can be seen for individual 1 and 2: all observations between the first time educ = 2 and the last time educ = 2 should be educ = 2 as well)

- All observations after the first time educ = 3 need to be educ = 3 as well (can be seen for individual 4: at time = 2 is the first observation of educ = 3, which means that education must be educ = 3 for time = 2,...5)