I’m currently facing the problem of a too-long computation time.
I have a large dataset with observations over several periods.
There are three variables I use for each period for computing: x, y, and b. (150 periods, 500 levels of x, 10000+ levels of y, for each x,y one level of b)
I have to calculate for every possible combination of x, with the data points matched on y to calculate my measure k using b.
The formula for each period is:
K_vx = ∑_y(b_v * b_x) / ∑_y(b_v * b_v)
In the following I am going to describe my approach, and my code.
I, therefore, looped over each level of period, kept only one period, and saved this data set as a new dataset. (Single_period.dta)
Then I looped over each level of a, keeping only this level of a.
Renamed variables x and b to v and b_from. Calculate each b_from^2 and the sum of all b-from ^2 for v.
Saved this dataset to a new dataset (single_period_x.dta)
Restored dataset single_period.
Then I loop again over each level of x and keep only one level of x.
I join this dataset with single_period_x on y. for the case that there is no fit in y I open an if loop for x is not 0
Then I calculate b_v * b_s and the sum of it. Finally I calculate with both sums K_vx = ∑_y(b_v * b_x) / ∑_y(b_v * b_v).
To save this Output I collapse the dataset to k by (v x period) and append this to an output file.
Then I use clear and restore the dataset with all x using single_period_x.dta
This program has a runtime for each period of 1,5 - 3h on my system. I am looking for a way to improve this calculation.
Does someone have an idea how to calculate this faster or can identify a bottleneck in this calculation?
Code:
clear
use data.dta
levelsof period, l(period_levels)
foreach q of local period_levels{
keep if period == `q'
save single_period, replace
levelsof x, l(x_levels)
foreach p of local x_levels{
keep if x == `p'
rename x v
rename b b_v
rename period period_v
generate b_v_sq = b_v * b_v
egen b_v_sq_sum = sum(b_v_sq)
save single_x_period, replace
clear
use single_period.dta
foreach r of local x_levels{
keep if x == `r'
joinby y using single_x_period.dta, unmatched(none)
if v != .{
generate b_sq = b * b
generate b_b_v = b * b_v
egen b_sq_sum = sum(b_sq)
egen b_b_v_sum = sum(b_b_v)
generate k = b_b_v_sum / b_v_sq_sum
collapse k, by(period_v period v x)
append using data/results/k_L1_L2.dta
save data/results/k_L1_L2.dta, replace
}
clear
use single_period.dta
}
use single_period.dta
}
use data.dta
}
0 Response to Calculating all possible combinations per period of one varlist
Post a Comment