Hello,
I'm trying to find a more concise/efficient way to identify duplicate data. It's not as straightforward as it's been in the past.
I have record-ids, start time of a first event, and end time of first event. This is then repeated in each row for as many events per record_id.
Example:
record_id StartTime EndTime
39 07jul2012 22:20:00 07jul2012 22:44:59
39 07jul2012 22:44:59 07jul2012 23:00:00
39 07jul2012 23:00:00 07jul2012 23:34:00
39 07jul2012 23:34:00 08jul2012 00:13:00
39 08jul2012 00:13:00 08jul2012 01:30:00
39 08jul2012 01:30:00 08jul2012 03:30:00
39 08jul2012 03:30:00 08jul2012 03:59:59
39 08jul2012 03:59:59 08jul2012 04:12:00
39 08jul2012 04:12:00 08jul2012 07:41:00
39 08jul2012 07:41:00 08jul2012 07:43:00
39 08jul2012 07:43:00 08jul2012 09:32:00
39 08jul2012 09:33:00 08jul2012 12:11:59
39 08jul2012 12:11:59 08jul2012 12:30:00
39 08jul2012 12:30:00 08jul2012 12:36:00
39 08jul2012 12:36:00 08jul2012 15:32:00
39 08jul2012 15:32:00 08jul2012 17:16:00
39 08jul2012 17:16:00 08jul2012 18:53:00
39 08jul2012 18:53:00 08jul2012 20:19:59
39 08jul2012 20:19:59 08jul2012 20:38:00
39 08jul2012 20:37:00 08jul2012 21:00:00
39 08jul2012 21:00:00 08jul2012 21:30:00
39 08jul2012 21:30:00 08jul2012 22:05:00
I have my data sorted by record_id starttime and endtime in sequential order. As you can see, the endtime is the same as the start time for the following line of data, for the same record_id. Is there a way to clean my dataset in STATA so that my endtime, is the last true endtime by identifying duplicates for endtime based on the subsequent starttime?
Thank you!
Related Posts with Duplicate Data
csvconvert (SSC)Hello - I am using csvconvert to import and append multiple CSV files and it worked fine with those …
Filling in unobserved dates with past valuesI have data that looks something like this: Array It is market cap data for different companies, s…
Confidence* intervals in synthetic control methodHello, I am implementing a synthetic control method (SCM) using Stata's synth command. I have seen s…
Tobit or OLS?when a variable is an index, it's from zero to one, and around one-third of observations is zero, sh…
Panel VAR and Granger CausalityHello, I am using Stata 17.0 and I am trying to run a panel vector autoregression followed by a Gra…
Subscribe to:
Post Comments (Atom)
0 Response to Duplicate Data
Post a Comment