this is a part of my dataset (the original has about 70000 rows):
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float obs double lfdnr float score double Sekunde float(first last last2 last3 last4 last5 last6) 10 26214 4.1398454 10 10 20 21 22 23 24 25 11 26214 3.401971 11 10 20 21 22 23 24 25 12 26214 2.2298167 12 10 20 21 22 23 24 25 13 26214 1.2570118 13 10 20 21 22 23 24 25 14 26214 8.820399 14 10 20 21 22 23 24 25 15 26214 4.157694 15 10 20 21 22 23 24 25 16 26214 1.639946 16 10 20 21 22 23 24 25 17 26214 3.9103115 17 10 20 21 22 23 24 25 18 26214 5.995928 18 10 20 21 22 23 24 25 19 26214 8.883919 19 10 20 21 22 23 24 25 20 26214 2.842385 20 10 20 21 22 23 24 25 21 26215 9.034828 21 21 27 28 29 31 33 34 22 26215 6.260192 22 21 27 28 29 31 33 34 23 26215 4.3280125 23 21 27 28 29 31 33 34 24 26215 8.655678 24 21 27 28 29 31 33 34 25 26215 4.5224366 25 21 27 28 29 31 33 34 26 26215 2.0769517 26 21 27 28 29 31 33 34 27 26215 7.788191 27 21 27 28 29 31 33 34 28 26216 7.25521 28 28 29 31 33 34 35 36 29 26216 7.179537 29 28 29 31 33 34 35 36 30 26218 9.387411 29 30 32 33 34 35 36 37 31 26217 5.093994 30 31 31 33 34 35 36 37 32 26218 1.60661 30 30 32 33 34 35 36 37 33 26219 4.0419 31 33 33 34 35 36 37 38 34 26221 9.773964 32 34 40 41 42 43 44 45 end
Lfdnr marks units which run over seconds (Sekunde).
Unfortunately, units overlap, so Sekunde has a lot of duplicates.
First, I had to calculate the difference in score between the first and last [last +1, last+2, ...] Sekunde of lfdnr.
Since Sekunde has duplicates and I can't just jump in the next row, I had to use this code:
Code:
duplicates tag Sekunde, gen (isdupold) replace isdupold = isdupold+1 order isdupold, after(Sekunde) bysort Sekunde(isdupold): gen t = _n replace t = t-1 order t, after(isdupold) generate isdup = isdupold -t order isdup, after(isdupold) generate obs = _n order obs sort lfdnr Sekunde by lfdnr : generate first = obs[1] if lfdnr!=1 by lfdnr : generate last = obs[_N] if lfdnr!=1 order first last, after(isdup) sort obs generate diffton0 = rtrmean_w[last] - rtrmean_w[first] generate last2 = last + isdup[last] order last2, after(last) generate diffton1 = rtrmean_w[last2] - rtrmean_w[first] generate last3 = last + isdup[last2] + isdup[last] order last3, after(last2) generate diffton2 = rtrmean_w[last3] - rtrmean_w[first] generate last4 = last + isdup[last3] + isdup[last2] + isdup[last] order last4, after(last3) generate diffton3 = rtrmean_w[last4] - rtrmean_w[first] generate last5 = last + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last] order last5, after(last4) generate diffton4 = rtrmean_w[last5] - rtrmean_w[first] generate last6 = last + isdup[last5] + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last] order last6, after(last5) generate diffton5 = rtrmean_w[last6] - rtrmean_w[first]
Now I'm facing the next problem:
I need to keep only the last (that is easy) and then the last+1, last+2... Sekunde of lfdnr and store it seperately for further calculations.
Once again, I can't just jump in the next row because of the duplicates.
Basically, last2 gives me the number of the observation of the last+1 Sekunde of lfdnr.
Is there a way to tell stata to keep the row number given in last2?
Thank you in advance!
Nadine
0 Response to Refering to a specific row with help of another variable
Post a Comment