Dear statalisters,

I am running the following code to cut up a very large file (>15g) into smaller pieces:

I have adapted the code from this thread: https://www.statalist.org/forums/for...ort-large-file

scalar recordstart = 1
scalar stepsize = 20000000

qui describe using large_file.dta
scalar nrecords = r(N)

scalar num_files = ceil(nrecords / stepsize)

forval part = 1/`=num_files' {

scalar start = 1 + ((`part' - 1) * stepsize)
di "This is the value of start for iteration `part' :" start

scalar stop = min((start + stepsize -1), nrecords)
di "This is the value of stop for iteration `part' :" stop

use "large_file.dta" in `=start'/`=stop', clear
save "large_file_`part'", replace
}

Which gives me the following output:

This is the value of start for iteration 1 :1
This is the value of stop for iteration 1 :20000000
file large_file_1.dta saved
This is the value of start for iteration 2 :15706
This is the value of stop for iteration 2 :15859
file large_file_2.dta saved
This is the value of start for iteration 3 :17348
This is the value of stop for iteration 3 :21550
file large_file_3.dta saved

As you can see, the values of start/stop after the first iteration are wrong. When I comment out the last two lines of the code, it returns the correct output:

forval part = 1/`=num_files' {

scalar start = 1 + ((`part' - 1) * stepsize)
di "This is the value of start for iteration `part' :" start
scalar stop = min((start + stepsize -1), nrecords)
di "This is the value of stop for iteration `part' :" stop
// use "large_file.dta" in `=start'/`=stop', clear
// save "large_file_`part'", replace
}

Output:

This is the value of start for iteration 1 :1
This is the value of stop for iteration 1 :20000000
This is the value of start for iteration 2 :20000001
This is the value of stop for iteration 2 :40000000
This is the value of start for iteration 3 :40000001
This is the value of stop for iteration 3 :43660920

What is causing this behaviour??

Thanks.