Hello,

This is my first post here and I am generally new to Stata, but I'll try to formulate my concern as precise as possible:

I am working with several datasets, each of which has different unique identifiers. In order to merge them, I need a linked dataset which causes my quite a headache.
As a start, I would like to eliminate adjacent observations if they share the same two common identifiers (for instance id1 and id2). I therefore created a new variable which should take the value 1 if that is indeed the case and zero otherwise. My code looks as follows:

gen sameid=0
replace sameid=1 if id1[_n]==id1[_n-1] & id2[_n]==id2[_n-1]

The code does what it is supposed to do only that it is not adhering to my previous sorting (I sorted the data in ascending order of "linkenddt" dates which is important for later operations). Specifically, the following is returned (as an excerpt):
id1 id2 linkdt linkenddt dup sameid
1. 1076 6765 04. Nov 82 31dec1992 5 0
2. 1076 6765 03. Nov 92 31dec1992 5 1
3. 1076 6765 01. Jan 93 30. Nov 10 5 1
4. 1076 6765 01. Jan 93 30. Nov 10 5 1
5. 1076 6765 01dec2010 10dec2010 5 1
6. 1076 6765 01dec2010 31dec2019 5 1
7. 1000 8987 01. Jan 50 30. Jan 62 1 0
8. 1000 8987 31. Jan 62 31dec2019 1 1
9. 375 55 08. Jun 83 09mar1998 0 0
10. 375 55 05. Jan 71 15. Aug 03 0 0
This puzzles me - shouldn't for the first observation samid be 1 instead of 0 and vice verca for observation 6 (for example)? I am asking since now dropping observations with sameid == 1 creates problems with the dates as I require the latest date later on. Do I commit a beginner's mistake which can easily be fixed?

Many thanks in advance and kind regards,

Jasper