Dear statalist,

this is a part of my dataset (the original has about 70000 rows):


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float obs double lfdnr float score double Sekunde float(first last last2 last3 last4 last5 last6)
10 26214 4.1398454 10 10 20 21 22 23 24 25
11 26214  3.401971 11 10 20 21 22 23 24 25
12 26214 2.2298167 12 10 20 21 22 23 24 25
13 26214 1.2570118 13 10 20 21 22 23 24 25
14 26214  8.820399 14 10 20 21 22 23 24 25
15 26214  4.157694 15 10 20 21 22 23 24 25
16 26214  1.639946 16 10 20 21 22 23 24 25
17 26214 3.9103115 17 10 20 21 22 23 24 25
18 26214  5.995928 18 10 20 21 22 23 24 25
19 26214  8.883919 19 10 20 21 22 23 24 25
20 26214  2.842385 20 10 20 21 22 23 24 25
21 26215  9.034828 21 21 27 28 29 31 33 34
22 26215  6.260192 22 21 27 28 29 31 33 34
23 26215 4.3280125 23 21 27 28 29 31 33 34
24 26215  8.655678 24 21 27 28 29 31 33 34
25 26215 4.5224366 25 21 27 28 29 31 33 34
26 26215 2.0769517 26 21 27 28 29 31 33 34
27 26215  7.788191 27 21 27 28 29 31 33 34
28 26216   7.25521 28 28 29 31 33 34 35 36
29 26216  7.179537 29 28 29 31 33 34 35 36
30 26218  9.387411 29 30 32 33 34 35 36 37
31 26217  5.093994 30 31 31 33 34 35 36 37
32 26218   1.60661 30 30 32 33 34 35 36 37
33 26219    4.0419 31 33 33 34 35 36 37 38
34 26221  9.773964 32 34 40 41 42 43 44 45
end

Lfdnr marks units which run over seconds (Sekunde).
Unfortunately, units overlap, so Sekunde has a lot of duplicates.

First, I had to calculate the difference in score between the first and last [last +1, last+2, ...] Sekunde of lfdnr.
Since Sekunde has duplicates and I can't just jump in the next row, I had to use this code:

Code:
duplicates tag Sekunde, gen (isdupold)
replace isdupold = isdupold+1
order isdupold, after(Sekunde)

bysort Sekunde(isdupold): gen t = _n
replace t = t-1
order t, after(isdupold)
generate isdup = isdupold -t
order isdup, after(isdupold)

generate obs = _n
order obs
sort lfdnr Sekunde
by lfdnr : generate first = obs[1] if lfdnr!=1
by lfdnr : generate last  = obs[_N] if lfdnr!=1
order first last, after(isdup)
sort obs

generate diffton0  = rtrmean_w[last]   - rtrmean_w[first]
generate last2  = last + isdup[last]
order last2, after(last)
generate diffton1 = rtrmean_w[last2] - rtrmean_w[first]
generate last3  = last + isdup[last2] + isdup[last]
order last3, after(last2)
generate diffton2 = rtrmean_w[last3] - rtrmean_w[first]
generate last4  = last + isdup[last3] + isdup[last2] + isdup[last]
order last4, after(last3)
generate diffton3 = rtrmean_w[last4] - rtrmean_w[first]
generate last5  = last + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last]
order last5, after(last4)
generate diffton4 = rtrmean_w[last5] - rtrmean_w[first]
generate last6  = last + isdup[last5] + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last]
order last6, after(last5)
generate diffton5 = rtrmean_w[last6] - rtrmean_w[first]
I probably made it way more complicated than it has to be, but it took me a long time to eve come up with this.

Now I'm facing the next problem:
I need to keep only the last (that is easy) and then the last+1, last+2... Sekunde of lfdnr and store it seperately for further calculations.
Once again, I can't just jump in the next row because of the duplicates.
Basically, last2 gives me the number of the observation of the last+1 Sekunde of lfdnr.
Is there a way to tell stata to keep the row number given in last2?

Thank you in advance!

Nadine