While I doubt this will be of much use to Stata users (boottest is damn fast and easier to use), I think the project is interesting in a number of ways:
- It offers a model of cross-platform development of statistical software. One need invest in building a feature set and optimizing code just once. Tailored front-ends can be written for each platform. In fact Alexander Fischer is writing one for R.
- Julia promises the plasticity of Python and the speed of C, roughly speaking, by way of just-in-time compilation. In my experience, fully realizing this potential takes a lot of work, at least when climbing learning curves. You have to dig into the innards of type inference and compilation and stare at pages of arcane output (from @code_warntype or SnoopCompile). Partly that is a comment on the immaturity of Julia. It has the well-recognized problem of long "time to first plot," meaning that there's a long lag the first time a package is used in a session. On my machine, the wildboottest() function often takes 12 seconds to run the first time, and it was a struggle to get it that low.
- Nevertheless, the promise is real. A programmer can achieve much higher performance than with Mata, yet without having to bother with manually compiling code for multiple operating systems and CPUs, the way you do with C plug-ins for Stata. An example below shows WildBootTests 10x faster than boottest even when calling from Stata.
- Julia could be more directly integrated into Stata, making the link easier and more reliable. I've already suggested that Stata corp do this, the way they have for Python. Or maybe a user could lead the way, as James Fiedler did for Python.
Code:
infile coll merit male black asian year state chst using regm.raw, clear qui xi: regress coll merit male black asian i.year i.state, cluster(state) generate individual = _n // unique ID for each observation timer clear timer on 1 boottest merit, nogr reps(9999) bootcluster(individual) // subcluster bootstrap timer off 1 timer on 2 mat b = e(b)[1,1..colsof(e(b))-1] // drop constant term global vars: colnames b // get right-side variable names python from julia import WildBootTests as wbt import numpy as np from sfi import Data R = np.concatenate(([1], np.zeros(`=colsof(b)'))).reshape(1,-1) # put null in Rβ = r form r = np.array([0]) resp = np.asarray(Data.get('coll')) # get response variable predexog = np.c_[np.asarray(Data.get('$vars')), np.ones(resp.size)] # get exogenous predictor variables + constant clustid = np.asarray(Data.get('individual state')).astype(int) # get clustering variables test = wbt.wildboottest(R, r, resp=resp, predexog=predexog, clustid=clustid, nbootclustvar=1, nerrclustvar=1, reps=9999) # do test wbt.teststat(test) # show results wbt.p(test) wbt.CI(test) end timer off 2 timer list
Code:
. timer list 1: 22.64 / 1 = 22.6360 2: 2.10 / 1 = 2.1040
One source of speed-up is that by default wildboottest() does all computations in single-precision (32-bit floats) rather than double, something that is not possible in Mata, but I think is typically fine for a bootstrap-based test.
0 Response to Ho, ho, ho! Wild bootstrap tests for everyone!
Post a Comment