Hi All, no specific code question here but rather a statistical one. I am trying to create a linear model that predicts the cost of a particular hip surgery. To set the scene, my n = ~19,000 and I am starting with ~460 variables (440 of which are dummies). I have so many variables because during a given surgery, many different medications or procedures can be given and across 19,000 patients, this results in many dummy variables for each medication or procedure.
Having said that, I will first use Lasso model selection with 5 fold cross validation as a guide to weed out variables that don't contribute much to the cost of the procedure. Since Lasso is not specifying a model based on p values, it does not present p values in the output. My concern is that submitting this model for publication will not go over well given reviewer's heavy reliance on p values.
I am deciding to then take the model that Lasso specified and use those independent variables in an OLS model. Using this method, I can now present p values and will be able to evaluate each independent variable for significance using the p values to determine the final model.
1. Is this sequence of model specification something that is reasonable to do/correct statistical methodology? Or will my OLS results be biased in some way?
2. Alternatively, I've seen Elastic Net used and read a paper that showed its results can be better than Lasso. Therefore, I was considering switching from Lasso to Elastic Net but am not sure how or if that would effect my interpretation of the results after I run the OLS in the second step?
Thanks in advance for the input!
Related Posts with Performing OLS after Lasso Model Selection
Table 1 help or test for significant differenceHello I have a data set in which I am trying to compare longitudinal data from successive pregnancie…
question about xtabond2 outputI am running the below syntax Code: xtabond2 generalcrime l.generalcrime proactivity feb mar apr…
Replace value in many observations and variablesDear all I have a database of around 50 variables and 200 observations. I would like to replace with…
Multiple imputation for correlated exposure variablesHello. I am trying to perform multiple imputation in my dataset using mi impute chained. The dataset…
Generating dummy variable conditional to other dummy variable figuresHi everyone, I have two dummy variables, holder67 and holder30, which are equal to 1 if the value o…
Subscribe to:
Post Comments (Atom)
0 Response to Performing OLS after Lasso Model Selection
Post a Comment