I have annual stock returns for a number of firms for about 20 years. The total firm-year obs are about 15k. I wanted to pick 100k random samples, say 20% of the obs each year. I am using forvalues loop to pick random samples by year, then compute portfolio means by year, add column to identify simulation index, and store it. The concern is that it is taking too much time and sleep option is necessary to avoid read-only issue while saving.
I was wondering if there is a better way to optimize, something like first use expand to first create 100k replicas and then compute returns by simulation index and year. I am okay with large file if it reduces the runtime.
data set looks like-
fid ayr ret
abc 2001 0.012
abc 2002 0.014
abc .....
abc 2020 0.032
xyz 2005 0.265
xyz 2006 0.023
.....
Code: I am using right now"
save yr_ret.dta, replace
local flag = 1
set seed 1234
forvalues i=1/100000 {
display "starting sample `i'"
use yr_ret.dta, replace
sample 20, by (ayr)
collapse (mean) eqret=ret (count) n=ret, by(ayr)
gen indx=`i'
if `flag'!=1 {
append using eq_ret_ranpf.dta
}
save eq_ret_ranpf.dta, replace
sleep 500
local flag = 0
}
Appreciate if someone can help.
Related Posts with Repeated random sampling without replacement from a panel data
Problems with the use autocompletion in do-filesSome things I recognized using autocompletion in Stata 16.0: the comand "mvdecode" does not appear …
randomselectHello, I am trying to randomly select a subsample of participants from my data set. I found the com…
How I can add 95% CI error bar to my multiple line graphHi, First, I have used the collapse command to make the mean and standard deviation by age and SES …
Errror in process stringHi all, I have an issues in processing string. I have a data set that includes some regions and a r…
Panel data: building sums correcting for focal panel ID characteristicsHello, I am trying to calculate sumi(xi*zj) where xi = a dummy zj = a dummy xi is connected to in f…
Subscribe to:
Post Comments (Atom)
0 Response to Repeated random sampling without replacement from a panel data
Post a Comment