Hi


I'd like to drop duplicates randomly instead of just the first duplicate observation.
A snapshot of my data set: Array


Each patent-invt_id has several co_invt_id. I want to keep only one co_invt_id but picked randomly.

I found the following code on the predecessor of statalist:
Code:
 bys varnames  : gen rnd = uniform()
bys varnames (rnd) : keep if _n == 1
Does it make sense? (I'm not very familiar with Stata syntax) I can execute it in my dataset but because I have over 1 million observation it's quite difficult to see if it indeed duplicates were dropped randomly. Any feedback would be welcome.