I am trying to understand the relationship between two variables with non-parametric regressions using commands npregress, lpoly, and lowess. Are they all considered to be kernel regressions?
As far as I understand:
(1) All of them fit local regressions at each point (ie, observation) based on a neighbourhood of points (within the chosen bandwidth). The further away from the observation in question, the less weight the data contribute to that regression. This makes the resulting function smooth when they are added together.
(2) The main difference between -lpoly- and -lowess- and -npregress- is that the -lowess- and -npregress- fits linear regressions or local means while -lpoly- fit polynomial regressions (i.e., you can choose the degree of the polynomial). Therefore, lpoly seems more general.
(3) Besides, there are some differences in terms of the default bandwidth and whether more than one explanatory variable can be included. Are there any other (important) differences I am missing out on?
I have been trying all the three commands with the same regression specification. The -lpoly- regression has proven to be a lot faster with my data, which I do not understand why given that this seemed to be the most general estimator (see item (2) above). The specification with command -npregress- has been taking forever: it has been 30 hours and the command is still running (I have 14 million obs). I ran the same specification with -lpoly- and got the result in less than 2 hours. I have also been running the same specification with -lowess- for the past 5 hours and still have no result.
Is there any way in which I could speed up the estimation with -npregress- and -lowess-? I am only interested in the prediction and not on standard errors.
Many thanks
Paula
0 Response to Non-parametric regression estimations to understand relationship between 2 variables: npregress, lpoly vs lowess
Post a Comment