Hello,

I am trying to randomly select a subsample of participants from my data set. I found the command randomselect useful in this sense, but I don't know how to set seed in my syntax so that the randomly selected observations are the same during subsequent runs of the do file.

Basically, I want to select two groups based on the following characteristics:

Group 1: N=3000, smokers, 50% female, aged 50-80
Group 2: N=3000, non smokers, 50% female, aged 20-80

Here is my syntax (with the seed command integrated but not working as expected):


Code:
randomselect if smoking == 1 & gender == 1, gen(sample_1) n(1500) seed(7492001)

randomselect if smoking == 1 & gender == 0 & sample_1 != 1, gen(sample_2) n(1500) seed(7492001)

randomselect if smoking == 0 & gender == 1, gen(sample_3) n(1500) seed(7492001)

randomselect if smoking == 0 & gender == 0 & sample_1 != 1, gen(sample_4) n(1500) seed(7492001)

g sample_smoking = 0 if inlist(1, sample_1, sample_2)
replace sample_smoking = 1 if inlist(1, sample_3, sample_4)

drop sample_1-sample_4

Thank you in advance for any comment!

Giovanni