Dear statalisters,
I am running the following code to cut up a very large file (>15g) into smaller pieces:
I have adapted the code from this thread: https://www.statalist.org/forums/for...ort-large-file
scalar recordstart = 1
scalar stepsize = 20000000
qui describe using large_file.dta
scalar nrecords = r(N)
scalar num_files = ceil(nrecords / stepsize)
forval part = 1/`=num_files' {
scalar start = 1 + ((`part' - 1) * stepsize)
di "This is the value of start for iteration `part' :" start
scalar stop = min((start + stepsize -1), nrecords)
di "This is the value of stop for iteration `part' :" stop
use "large_file.dta" in `=start'/`=stop', clear
save "large_file_`part'", replace
}
Which gives me the following output:
This is the value of start for iteration 1 :1
This is the value of stop for iteration 1 :20000000
file large_file_1.dta saved
This is the value of start for iteration 2 :15706
This is the value of stop for iteration 2 :15859
file large_file_2.dta saved
This is the value of start for iteration 3 :17348
This is the value of stop for iteration 3 :21550
file large_file_3.dta saved
As you can see, the values of start/stop after the first iteration are wrong. When I comment out the last two lines of the code, it returns the correct output:
forval part = 1/`=num_files' {
scalar start = 1 + ((`part' - 1) * stepsize)
di "This is the value of start for iteration `part' :" start
scalar stop = min((start + stepsize -1), nrecords)
di "This is the value of stop for iteration `part' :" stop
// use "large_file.dta" in `=start'/`=stop', clear
// save "large_file_`part'", replace
}
Output:
This is the value of start for iteration 1 :1
This is the value of stop for iteration 1 :20000000
This is the value of start for iteration 2 :20000001
This is the value of stop for iteration 2 :40000000
This is the value of start for iteration 3 :40000001
This is the value of stop for iteration 3 :43660920
What is causing this behaviour??
Thanks.
Related Posts with Odd scalar behaviour
Probit with Panel DataHi Everyone, I am using two waves of panel data to assess the impact of socio-demographic character…
What fixed effects and cluster group to use on "three dimensional" panel data?Hello everyone, I have an (unbalacend) panel data set of German companies with investments in multi…
Find occurrences of string across multiple variablesHi, I want to find the amount of occurrences of a string variable across multiple variables by diff…
High resolution graphs for publishingDear Users, I would like to ask you if do you know how to deal with graphs' resolution for scientif…
test for comparing two Poisson meansHi. I work with count data and the comparison of the two groups is the purpose of my study. My profe…
Subscribe to:
Post Comments (Atom)
0 Response to Odd scalar behaviour
Post a Comment