Code:
ssc install gtools, replace
New commands:
- greshape long/wide, 4-20x faster than reshape long/wide (additionally accepts any number of i or j variables).
- greshape gather/spread, similar to long/wide but made to mimic the gather and spread commands in R's tidyr package.
- gstats tab, 5-40x faster than tabstat (additionally accepts any number of grouping variables).
- gstats sum, 5-10x faster than sum, detail (regular summarize is not slow, but -detail- is slow to compute all the percentiles).
- gstats winsor, 10-20x faster than winsor2.
- gcollapse, gegen, and gstats tab now allow the following statistics:
- select# and select-#, to select the #th smallest or largest value
- rawselect# and rawselect-#, ibid but ignoring weights.
- cv, to compute the coefficient of variation
- variance
- range
- gtop and glevelsof can save their results in a mata object via mata(name).
- gtop (gtoplevelsof) can list all the levels via ntop(.), similar to tablist (ntop(-.) lists from least to most common order; option -alpha- lists the top levels in variable order instead of frequency order.
- greshape allows varlist syntax for long to wide reshapes (though this cannot be combined with @ in the same sub); wide to long matches do not allow varlist syntax, but complex matches can be achieved via the option match(regex), which takes the stubs to be regular expressions (details here).
Code:
clear all
ssc install winsor2
program bench
gettoken timer call: 0, p(:)
gettoken colon call: call, p(:)
cap timer clear `timer'
timer on `timer'
`call'
timer off `timer'
qui timer list
c_local r`timer' `=r(t`timer')'
end
set obs 10000000
gen groups = int(runiform() * 1000)
gen smallg = mod(groups, 10)
gen rsort = rnormal()
gen rvar = rnormal()
gen ix = _n
sort rsort
preserve
rename (rsort rvar) (r1 r2)
bench 11: greshape long r, i(ix) j(j)
restore, preserve
rename (rsort rvar) (r1 r2)
greshape long r, i(ix) j(j) nochecks
bench 16: greshape wide r, i(ix) j(j)
restore, preserve
rename (rsort rvar) (r1 r2)
bench 10: reshape long r, i(ix) j(j)
restore, preserve
rename (rsort rvar) (r1 r2)
greshape long r, i(ix) j(j) nochecks
bench 15: reshape wide r, i(ix) j(j)
restore
bench 21: qui gstats winsor rvar, s(_wg)
bench 20: qui winsor2 groups
bench 26: qui gstats sum rvar
bench 25: qui sum rvar, detail
bench 31: qui gstats tab rvar, by(smallg) s(n mean min max)
bench 30: qui tabstat rvar, by(smallg) s(n mean min max)
local commands ///
reshape_long ///
reshape_wide ///
winsor ///
sum_detail ///
tabstat
local bench_table `" Versus | Native | gtools | % faster "'
local bench_table `"`bench_table'"' _n(1) `" ------------ | ------ | ------ | -------- "'
forvalues i = 10(5)30 {
gettoken cmd commands: commands
local pct "`:disp %7.2f 100 * (`r`i'' - `r`=`i'+1'') / `r`i'''"
local dnative "`:disp %6.2f `r`i'''"
local dgtools "`:disp %6.2f `r`=`i'+1'''"
local cmd `"`:disp %12s "`cmd'"'"'
local bench_table `"`bench_table'"' _n(1) `" `cmd' | `dnative' | `dgtools' | `pct'% "'
}
disp _n(1) `"`bench_table'"'
Code:
Versus | Native | gtools | % faster
------------ | ------ | ------ | --------
reshape_long | 111.63 | 8.21 | 92.65%
reshape_wide | 127.61 | 16.52 | 87.05%
winsor | 28.87 | 1.17 | 95.96%
sum_detail | 30.50 | 1.63 | 94.65%
tabstat | 32.63 | 1.03 | 96.83%
0 Response to Gtools update available on SSC: greshape, gstats winsor, gstats tab, and more!
Post a Comment