I have annual stock returns for a number of firms for about 20 years. The total firm-year obs are about 15k. I wanted to pick 100k random samples, say 20% of the obs each year. I am using forvalues loop to pick random samples by year, then compute portfolio means by year, add column to identify simulation index, and store it. The concern is that it is taking too much time and sleep option is necessary to avoid read-only issue while saving.

I was wondering if there is a better way to optimize, something like first use expand to first create 100k replicas and then compute returns by simulation index and year. I am okay with large file if it reduces the runtime.

data set looks like-
fid ayr ret
abc 2001 0.012
abc 2002 0.014
abc .....
abc 2020 0.032
xyz 2005 0.265
xyz 2006 0.023
.....


Code: I am using right now"

save yr_ret.dta, replace

local flag = 1
set seed 1234
forvalues i=1/100000 {
display "starting sample `i'"
use yr_ret.dta, replace
sample 20, by (ayr)
collapse (mean) eqret=ret (count) n=ret, by(ayr)
gen indx=`i'
if `flag'!=1 {
append using eq_ret_ranpf.dta
}
save eq_ret_ranpf.dta, replace
sleep 500
local flag = 0
}

Appreciate if someone can help.