Hello Statalist,

Question here about counting with _n in large datasets. I am attempting to create a row identifier variable which simply takes the value of the row number. In the past I've used the command
Code:
gen row_id = _n
. This works fine for the first ~16.5 million rows. After this point, however, the values in "row_id" do not match the row number! For example, here's what the output looks like:

True Row row_id
16961210 16961210
16961211 16961212
16961212 16961212
16961213 16961212
16961214 16961214
16961215 16961216
16961216 16961216
16961217 16961216

As is apparent, "row_id" begins a cycle of correct then incorrect numbers, returning to the correct value every so often before deviating again. I've tried looping through the integers 1-17million to manually create "row_id." Again, I run into the same problem. I'd include a full example here, but it seems too large to post.

Can anyone provide insight on what I'm doing wrong here?

I'm running StataMP 17.0 on Mac OS.

Thanks in advance!

Andy