Hello Statalist,

Question here about counting with _n in large datasets. I am attempting to create a row identifier variable which simply takes the value of the row number. In the past I've used the command
gen row_id = _n
. This works fine for the first ~16.5 million rows. After this point, however, the values in "row_id" do not match the row number! For example, here's what the output looks like:

True Row row_id
16961210 16961210
16961211 16961212
16961212 16961212
16961213 16961212
16961214 16961214
16961215 16961216
16961216 16961216
16961217 16961216

As is apparent, "row_id" begins a cycle of correct then incorrect numbers, returning to the correct value every so often before deviating again. I've tried looping through the integers 1-17million to manually create "row_id." Again, I run into the same problem. I'd include a full example here, but it seems too large to post.

Can anyone provide insight on what I'm doing wrong here?

I'm running StataMP 17.0 on Mac OS.

Thanks in advance!
