Hi, I need help with imputing missing values for my panel data for 100+ countries for 15 years (from 1990 to 2004). There are six variables. For some variables and countries, the data for the first few years are missing and start randomly within the first 10 years. For example, for country A, variable X might miss the data for 1990 and 1991 but start at 1992 while for country B the same variable X might miss data until 1997 and start in 1998. I need to fill in the missing values for the early years but only if the first non-missing value is within the first five years ahead (so, for 1990, up to 1995 and not beyond). For conceptual reasons, it would be incorrect to use a value that's more than 5 years ahead.

I have posted an example below with just two countries, Albania and Algeria. For my purpose, the general command:

mipolate x year, by(id) gen(x_new) backward

would work for Albania because its first non-missing value starts within the first 5 years (in 1993). However, it wouldn't work for Algeria because its first non-missing value for this variable starts outside of this 5-year restriction (in 1998).


year country x id

1990 Albania . 1
1991 Albania . 1
1992 Albania . 1
1993 Albania 5 1
1994 Albania 5 1

1995 Albania 5 1
1996 Albania 6 1
1997 Albania 5 1
1998 Albania 5 1
1999 Albania 5 1

2000 Albania 5 1
2001 Albania 5 1
2002 Albania 7 1
2003 Albania 7 1
2004 Albania 7 1

1990 Algeria . 2
1991 Algeria . 2
1992 Algeria . 2
1993 Algeria . 2
1994 Algeria . 2

1995 Algeria . 2
1996 Algeria . 2
1997 Algeria . 2
1998 Algeria -3 2
1999 Algeria -3 2

2000 Algeria -3 2
2001 Algeria -3 2
2002 Algeria -3 2
2003 Algeria -3 2
2004 Algeria 2 2

Is there a command to restrict the first non-missing value picked to only until the first 5 years ahead? If mipolate does not have a ready command, maybe a set of commands can be constructed from first principles (I have seen Nick Cox talk about it often on this forum)? But I admit I do not know how to create that set of commands. Thanks for any help anyone can offer.