Hi all, I have a very large dataset of 970,000 observations, this dataset was given to be an organisation.

I tried to merge this dataset with another which came back with the error

stata does not uniquely identify observations in the master data

Which I figured it it has to do with my ID variable. I checked for any missing in both the master and merge file which there are none.

I then checked for duplicates as I figured out this would be the only other reason. (Although in none of my code have I myself introduced any duplicates)

I tried duplicates report

Array

I then tried to list the duplicates of course there were too many.

I then tried codebook - as you can see the unique values here differ.

Array

My question: Why does codebook show different number of unique values to the duplicates report which shows there are 959,798 unique values.