I’m currently facing the problem of a too-long computation time.
I have a large dataset with observations over several periods.
There are three variables I use for each period for computing: x, y, and b. (150 periods, 500 levels of x, 10000+ levels of y, for each x,y one level of b)
I have to calculate for every possible combination of x, with the data points matched on y to calculate my measure k using b.
The formula for each period is:
K_vx = ∑_y(b_v * b_x) / ∑_y(b_v * b_v)
In the following I am going to describe my approach, and my code.
I, therefore, looped over each level of period, kept only one period, and saved this data set as a new dataset. (Single_period.dta)
Then I looped over each level of a, keeping only this level of a.
Renamed variables x and b to v and b_from. Calculate each b_from^2 and the sum of all b-from ^2 for v.
Saved this dataset to a new dataset (single_period_x.dta)
Restored dataset single_period.
Then I loop again over each level of x and keep only one level of x.
I join this dataset with single_period_x on y. for the case that there is no fit in y I open an if loop for x is not 0
Then I calculate b_v * b_s and the sum of it. Finally I calculate with both sums K_vx = ∑_y(b_v * b_x) / ∑_y(b_v * b_v).
To save this Output I collapse the dataset to k by (v x period) and append this to an output file.
Then I use clear and restore the dataset with all x using single_period_x.dta
This program has a runtime for each period of 1,5 - 3h on my system. I am looking for a way to improve this calculation.
Does someone have an idea how to calculate this faster or can identify a bottleneck in this calculation?
Code:
clear use data.dta levelsof period, l(period_levels) foreach q of local period_levels{ keep if period == `q' save single_period, replace levelsof x, l(x_levels) foreach p of local x_levels{ keep if x == `p' rename x v rename b b_v rename period period_v generate b_v_sq = b_v * b_v egen b_v_sq_sum = sum(b_v_sq) save single_x_period, replace clear use single_period.dta foreach r of local x_levels{ keep if x == `r' joinby y using single_x_period.dta, unmatched(none) if v != .{ generate b_sq = b * b generate b_b_v = b * b_v egen b_sq_sum = sum(b_sq) egen b_b_v_sum = sum(b_b_v) generate k = b_b_v_sum / b_v_sq_sum collapse k, by(period_v period v x) append using data/results/k_L1_L2.dta save data/results/k_L1_L2.dta, replace } clear use single_period.dta } use single_period.dta } use data.dta }
0 Response to Calculating all possible combinations per period of one varlist
Post a Comment