Recently I have been trying to use the new nonparametric regression feature in Stata 16, npregress series, on different subsamples of my data. I found it to be slow. After digging in, I think I've discovered a strange behavior, where npregress becomes much slower when you increase the size of the data-set in memory, without changing the size of the sample in the estimation.
Consider the below example.
Toy example
Code:
clear set obs 100000 gen x1 = runiform() gen x2 = runiform() gen y = cos(x1)*sin(x2) + x1^2 + 1/3*runiform() npregress series y x1 x2 if _n < 1001, polynomial
Code:
drop if _n >=1001 npregress series y x1 x2 if _n < 1001, polynomial
Can someone explain why this is happening? Is npregress utilizing the unsampled data somehow? I was hoping to be able to repeatedly run npregress on subsamples of my data in order to construct non-parametric predictions without needing to repeatedly shuffle the data in memory (which will also take a long time, given that I am using a moderately large data-set).
Best,
Rustin
0 Response to Npregress slow with large data-sets, small samples
Post a Comment