Hi All, no specific code question here but rather a statistical one. I am trying to create a linear model that predicts the cost of a particular hip surgery. To set the scene, my n = ~19,000 and I am starting with ~460 variables (440 of which are dummies). I have so many variables because during a given surgery, many different medications or procedures can be given and across 19,000 patients, this results in many dummy variables for each medication or procedure.
Having said that, I will first use Lasso model selection with 5 fold cross validation as a guide to weed out variables that don't contribute much to the cost of the procedure. Since Lasso is not specifying a model based on p values, it does not present p values in the output. My concern is that submitting this model for publication will not go over well given reviewer's heavy reliance on p values.
I am deciding to then take the model that Lasso specified and use those independent variables in an OLS model. Using this method, I can now present p values and will be able to evaluate each independent variable for significance using the p values to determine the final model.
1. Is this sequence of model specification something that is reasonable to do/correct statistical methodology? Or will my OLS results be biased in some way?
2. Alternatively, I've seen Elastic Net used and read a paper that showed its results can be better than Lasso. Therefore, I was considering switching from Lasso to Elastic Net but am not sure how or if that would effect my interpretation of the results after I run the OLS in the second step?
Thanks in advance for the input!
Related Posts with Performing OLS after Lasso Model Selection
Dummy variablesHi, Im running some regressions to investigate the effects corruption has on enrolment rates. I am …
log log transformationI'm studying the firm size distribution i need to represent it in a double logarithmic form, my prob…
Request for suitable methodologyDear all. I'm doing my research on Unemployment, and my topic is "The Impact of remittances on unem…
r(2000) for VAR modelHello, I am new to Stata and am experiencing problems. There are several threads concerning this pr…
Multilevel crossed interaction for factor variableDear all I'm working in Stata 14.2. I'm doing a two-level logit model with melogit. I'm interested i…
Subscribe to:
Post Comments (Atom)
0 Response to Performing OLS after Lasso Model Selection
Post a Comment