Handling duplicates

Hello,

This is my first post here and I am generally new to Stata, but I'll try to formulate my concern as precise as possible:

I am working with several datasets, each of which has different unique identifiers. In order to merge them, I need a linked dataset which causes my quite a headache.
As a start, I would like to eliminate adjacent observations if they share the same two common identifiers (for instance id1 and id2). I therefore created a new variable which should take the value 1 if that is indeed the case and zero otherwise. My code looks as follows:

gen sameid=0
replace sameid=1 if id1[_n]==id1[_n-1] & id2[_n]==id2[_n-1]

The code does what it is supposed to do only that it is not adhering to my previous sorting (I sorted the data in ascending order of "linkenddt" dates which is important for later operations). Specifically, the following is returned (as an excerpt):

	id1	id2	linkdt	linkenddt	dup	sameid
1.	1076	6765	04. Nov 82	31dec1992	5	0
2.	1076	6765	03. Nov 92	31dec1992	5	1
3.	1076	6765	01. Jan 93	30. Nov 10	5	1
4.	1076	6765	01. Jan 93	30. Nov 10	5	1
5.	1076	6765	01dec2010	10dec2010	5	1
6.	1076	6765	01dec2010	31dec2019	5	1
7.	1000	8987	01. Jan 50	30. Jan 62	1	0
8.	1000	8987	31. Jan 62	31dec2019	1	1
9.	375	55	08. Jun 83	09mar1998	0	0
10.	375	55	05. Jan 71	15. Aug 03	0	0

This puzzles me - shouldn't for the first observation samid be 1 instead of 0 and vice verca for observation 6 (for example)? I am asking since now dropping observations with sameid == 1 creates problems with the dates as I require the latest date later on. Do I commit a beginner's mistake which can easily be fixed?

Many thanks in advance and kind regards,

Jasper

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Handling duplicates
Handling duplicates

0 Response to Handling duplicates

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Handling duplicates Handling duplicates

Related Posts with Handling duplicates

0 Response to Handling duplicates

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Handling duplicates
Handling duplicates