Setting version, seed, and sortseed not sufficient for reproducibility?

The following code appears to produce different results depending on the Stata 16.1 edition I use.

Code:

clear all
version 16.1
set seed 872387
set sortseed 636445
set obs 1000
gen id = _n
gen x = 0
replace x = 1 in 200/600
sort x
list id in 1/5

In Stata 16.1 MP (2-core), I get: 985, 742, 906, 188, 931.
In Stata 16.1 SE, I get: 998, 112, 992, 943, 690.

Here is another example using merge:

Code:

clear all
version 16.1
set seed 872387
set sortseed 636445
set obs 1
gen x = 1
tempfile s
save `s'
clear
set obs 1000
gen id = _n
gen x = 0
replace x = 1 in 200/600
merge m:1 x using `s'
list id in 1/10

Again, in Stata 16.1 MP (2-core), I get: 985, 742, 906, 188, 931.
In Stata 16.1 SE, I get: 998, 112, 992, 943, 690.

I understand that I can work around this problem by doing a stable sort (prior to merge, in the second example). But I'm surprised that setting the version, seed, and sortseed do not appear to be sufficient to ensure reproducibility across editions of the same Stata version. Is that true, or am I missing something? Is this due to how sorting is parallelized?

Note that I did not observe this problem with fewer observations. The following produced the same output in both Stata 16.1 SE and Stata 16.1 MP (2-core):

Code:

clear all
version 16.1
set sortseed 636445
set obs 100
gen id = _n
gen x = 0
replace x = 1 in 20/60
sort x
list id in 1/5

Thank you!

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Setting version, seed, and sortseed not sufficient for reproducibility?
Setting version, seed, and sortseed not sufficient for reproducibility?

0 Response to Setting version, seed, and sortseed not sufficient for reproducibility?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Setting version, seed, and sortseed not sufficient for reproducibility? Setting version, seed, and sortseed not sufficient for reproducibility?

Related Posts with Setting version, seed, and sortseed not sufficient for reproducibility?

0 Response to Setting version, seed, and sortseed not sufficient for reproducibility?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Setting version, seed, and sortseed not sufficient for reproducibility?
Setting version, seed, and sortseed not sufficient for reproducibility?