I have a methodological question concerning lasso regressions and the lasso linear command in Stata.
I have a dataset on daily investment flows of firms and a huge collection of dummy variables which constitute daily signals upon which the firms potentially invest.
There are more than one million observations and more than 2000 dummy variables (D_*) and a set of a few further controls (C_*).
I want to find out which of the dummy variables are most relevant to explain the dependent variable (FLOW).
To do so, I estimate a lasso linear regression command of FLOW on D_* with C_* being variables which are always included. Due to a long computation time over the whole sample, I first ran this command on a subsample of a random draw of 10,000 observations.
Code:
lasso linear FLOW (C_*) D_* if random_sample == 1
Code:
Lasso linear model No. of obs = 10,000 No. of covariates = 2,179 Selection: Cross-validation No. of CV folds = 10 -------------------------------------------------------------------------- | No. of Out-of- CV mean | nonzero sample prediction ID | Description lambda coef. R-squared error ---------+---------------------------------------------------------------- 1 | first lambda 612.519 32 0.1412 1.13e+08 6 | lambda before 384.6798 34 0.1427 1.13e+08 * 7 | selected lambda 350.5059 35 0.1427 1.13e+08 8 | lambda after 319.3679 35 0.1426 1.13e+08 12 | last lambda 220.1279 57 0.1407 1.14e+08 --------------------------------------------------------------------------
Therefore, my question: Is it possible to run lasso such that it sets coefficients to zero which are close to zero or less than zero?
Thanks
0 Response to Linear Lasso Regressions and Stata's "lasso linear" Command
Post a Comment