Hi All, no specific code question here but rather a statistical one. I am trying to create a linear model that predicts the cost of a particular hip surgery. To set the scene, my n = ~19,000 and I am starting with ~460 variables (440 of which are dummies). I have so many variables because during a given surgery, many different medications or procedures can be given and across 19,000 patients, this results in many dummy variables for each medication or procedure.
Having said that, I will first use Lasso model selection with 5 fold cross validation as a guide to weed out variables that don't contribute much to the cost of the procedure. Since Lasso is not specifying a model based on p values, it does not present p values in the output. My concern is that submitting this model for publication will not go over well given reviewer's heavy reliance on p values.
I am deciding to then take the model that Lasso specified and use those independent variables in an OLS model. Using this method, I can now present p values and will be able to evaluate each independent variable for significance using the p values to determine the final model.
1. Is this sequence of model specification something that is reasonable to do/correct statistical methodology? Or will my OLS results be biased in some way?
2. Alternatively, I've seen Elastic Net used and read a paper that showed its results can be better than Lasso. Therefore, I was considering switching from Lasso to Elastic Net but am not sure how or if that would effect my interpretation of the results after I run the OLS in the second step?
Thanks in advance for the input!
Related Posts with Performing OLS after Lasso Model Selection
Stata code wighted census data for PCAHello all, I am new to the Stata Community and working on the 2016 Canadian census of population mi…
Plot interaction effect after stcox with tvc?Since proportional hazards assumption is violated, I use tvc option to estimate my model, which surp…
Individual level data, Clustering Standard Error at city level, IV regression need syntax helpSay your data is at individual level, instrument and independent variable at city level, data on 2 y…
Generate a new variableDear all, I hope you can help me. I would like to generate a new variables based on values of anot…
Global macrosI was looking at some guide and saw them using the commands cd, global datadir, global dirresults. I…
Subscribe to:
Post Comments (Atom)
0 Response to Performing OLS after Lasso Model Selection
Post a Comment