Hi All, no specific code question here but rather a statistical one. I am trying to create a linear model that predicts the cost of a particular hip surgery. To set the scene, my n = ~19,000 and I am starting with ~460 variables (440 of which are dummies). I have so many variables because during a given surgery, many different medications or procedures can be given and across 19,000 patients, this results in many dummy variables for each medication or procedure.
Having said that, I will first use Lasso model selection with 5 fold cross validation as a guide to weed out variables that don't contribute much to the cost of the procedure. Since Lasso is not specifying a model based on p values, it does not present p values in the output. My concern is that submitting this model for publication will not go over well given reviewer's heavy reliance on p values.
I am deciding to then take the model that Lasso specified and use those independent variables in an OLS model. Using this method, I can now present p values and will be able to evaluate each independent variable for significance using the p values to determine the final model.
1. Is this sequence of model specification something that is reasonable to do/correct statistical methodology? Or will my OLS results be biased in some way?
2. Alternatively, I've seen Elastic Net used and read a paper that showed its results can be better than Lasso. Therefore, I was considering switching from Lasso to Elastic Net but am not sure how or if that would effect my interpretation of the results after I run the OLS in the second step?
Thanks in advance for the input!
Related Posts with Performing OLS after Lasso Model Selection
regression line confidence intervalPlease how do i provide a 95% confidence interval for the parameters of the regression line in stata…
"Equation not found" - Hurdle model using suestHi there, I am using Stata 15. I am trying to estimate the marginal effects for a logit-truncated n…
Modifying labels when appending datasetsI am appending datasets in which the same variable have different label values. In the example below…
Creating time trendsHi, I have a question regarding creating time trends to account for time effects before and after t…
predicted valueWhat is the predicted blood pressure for an average 13-year-old boy as estimated from the regression…
Subscribe to:
Post Comments (Atom)
0 Response to Performing OLS after Lasso Model Selection
Post a Comment