Hello Statalist,

I've got a vexing data wrangling question that I need help figuring our. I've got a medium-sized set of survey data (about 40k rows) with two kinds of duplicate records that I'm trying to filter out. After my wrangling is done, my goal is to have only the most recent partial responses for which there is no complete survey record.

Thus I'm trying to:
  • Drop any partial response that is associated with a complete response (same respondent ID) and then drop the complete response
  • Drop any duplicate partial responses so that I only have the most recent partials left
I've tried various approaches to ordering and tagging the partials and duplicates, but I can't figure out the right way to do both --- especially because I have up to 8 partials with the same respondent ID but different time stamps. I'm attaching a DTA of 10 records that indicates the time, duplicate status, survey complete status, ID, and then shows as a binary variable what I'm trying to end up with.

Thank you in advance for any help!

-Nick