Dear Statalist Masters,

My first post here at statalist, after having been a passive reader for several years! Glad to now be a member of the forum flock.

To the issue: I have a problem with creating lag-variables. I am working with a panel dataset with around 800.000 observations covering satellite data on weather and greenness in Ethiopia for each month of the years 2000-2017. In long format, an individual observation is a pixel (identified by the variable id) in a specific month of a specific year (identified by the variable yearandmonth). A subset looks like this:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id yearandmonth) float(ndvi_an chirps_an lst_day_an lst_night_an)
10869164 200001          .  .14173031          .          .
10869164 200002  168.16663  -.3859761          .          .
10869164 200003  211.55554  -4.910643    110.667  14.166992
10869164 200004  238.55554   56.76266 -113.44434 -25.722656
10869164 200005   445.1666 -10.590395  -28.88867  -38.83301
10869164 200006  213.16663 -19.063148  155.88867  11.444336
10869164 200007 -289.05566  13.267944   3.833008  -46.94434
10869164 200008  303.38867  -42.79756   2.666992  -62.66699
10869164 200009  148.88867  -13.21985   6.764648  -56.41211
10869164 200010  530.66675   21.98996 -145.29395  -102.5293
10869164 200011   489.7058  .48916245  26.706055  -38.29395
10869164 200012  261.23535   .3373172  10.941406  -49.88281
10869164 200101  179.35303 -.50386167   8.235352  -71.41211
10869164 200102  110.16663  -.4940401  32.529297  -52.41211
10869164 200103   52.55554 -1.1464376   20.66699  -92.83301
10869164 200104   91.55554 -12.991203  110.55566   54.27734
10869164 200105 -36.833374 -17.386755  134.11133 -16.833008
10869164 200106  -49.83337 -10.651672  -44.11133  31.444336
10869164 200107   385.9443  20.109344  -64.16699  -62.94434
10869164 200108  1091.3887   29.89163  -61.33301  -59.66699
10869164 200109   870.8887 -36.556335  -7.235352   30.58789
10869164 200110  434.66675 -11.603274 -33.293945 -23.529297
10869164 200111   268.7058  -5.093596  -6.293945 -14.293945
10869164 200112  277.23535  -.5618868  14.941406   25.11719
10869164 200201  257.35303    .670423  -29.76465  -34.41211
10869164 200202  203.16663 -.42307615   -44.4707  -20.41211
10869164 200203  170.55554   .6062927  -76.33301  -29.83301
10869164 200204  142.55554 -18.338364  70.555664  21.277344
10869164 200205  176.16663 -17.451864   67.11133  14.166992
10869164 200206  112.16663 -23.617104   84.88867  74.444336
end
format %tm yearandmonth
I want to run a regression testing the effect of temperature (lst_day_an and lst_night_an) and precipitation (chirps_an) in the 6 months leading up to and including the month of observation, on greenness (ndvi_an). To do this, I want to create lag-variables for temperature and precipitation for 6 months. Here is an example of the code I use to do this:

Code:
/* Generate lagged anomaly variables */

foreach y in 1 2 3 4 5 6 {
gen chirps_an_lag`y' = l`y'.chirps_an
}
​​​​​​​However (and here comes the issue), when I create the lag-variables, they somehow don’t cross new years, that is, the lag variable does not for instance recognize December of 2005 as a 2 month lag for February 2006. Instead it generates a missing value. This means that I only have a full set of 1-6 month lag variables for the 6 last months of every year for each pixel, which isn’t ideal. Any ideas of how to fix this?

As always – many thanks for assistance,
Lars