I am trying to randomly select a subsample of participants from my data set. I found the command randomselect useful in this sense, but I don't know how to set seed in my syntax so that the randomly selected observations are the same during subsequent runs of the do file.
Basically, I want to select two groups based on the following characteristics:
Group 1: N=3000, smokers, 50% female, aged 50-80
Group 2: N=3000, non smokers, 50% female, aged 20-80
Here is my syntax (with the seed command integrated but not working as expected):
randomselect if smoking == 1 & gender == 1, gen(sample_1) n(1500) seed(7492001) randomselect if smoking == 1 & gender == 0 & sample_1 != 1, gen(sample_2) n(1500) seed(7492001) randomselect if smoking == 0 & gender == 1, gen(sample_3) n(1500) seed(7492001) randomselect if smoking == 0 & gender == 0 & sample_1 != 1, gen(sample_4) n(1500) seed(7492001) g sample_smoking = 0 if inlist(1, sample_1, sample_2) replace sample_smoking = 1 if inlist(1, sample_3, sample_4) drop sample_1-sample_4
Thank you in advance for any comment!
0 Response to randomselect
Post a Comment