Hello,

I'm trying to find a more concise/efficient way to identify duplicate data. It's not as straightforward as it's been in the past.

I have record-ids, start time of a first event, and end time of first event. This is then repeated in each row for as many events per record_id.

Example:

record_id StartTime EndTime
39 07jul2012 22:20:00 07jul2012 22:44:59
39 07jul2012 22:44:59 07jul2012 23:00:00
39 07jul2012 23:00:00 07jul2012 23:34:00
39 07jul2012 23:34:00 08jul2012 00:13:00
39 08jul2012 00:13:00 08jul2012 01:30:00
39 08jul2012 01:30:00 08jul2012 03:30:00
39 08jul2012 03:30:00 08jul2012 03:59:59
39 08jul2012 03:59:59 08jul2012 04:12:00
39 08jul2012 04:12:00 08jul2012 07:41:00
39 08jul2012 07:41:00 08jul2012 07:43:00
39 08jul2012 07:43:00 08jul2012 09:32:00
39 08jul2012 09:33:00 08jul2012 12:11:59
39 08jul2012 12:11:59 08jul2012 12:30:00
39 08jul2012 12:30:00 08jul2012 12:36:00
39 08jul2012 12:36:00 08jul2012 15:32:00
39 08jul2012 15:32:00 08jul2012 17:16:00
39 08jul2012 17:16:00 08jul2012 18:53:00
39 08jul2012 18:53:00 08jul2012 20:19:59
39 08jul2012 20:19:59 08jul2012 20:38:00
39 08jul2012 20:37:00 08jul2012 21:00:00
39 08jul2012 21:00:00 08jul2012 21:30:00
39 08jul2012 21:30:00 08jul2012 22:05:00

I have my data sorted by record_id starttime and endtime in sequential order. As you can see, the endtime is the same as the start time for the following line of data, for the same record_id. Is there a way to clean my dataset in STATA so that my endtime, is the last true endtime by identifying duplicates for endtime based on the subsequent starttime?

Thank you!