Hello,
I was wondering when it is best to clean the data (i.e. delete missing or negative values) when performing different types of regression analysis?
If I want to first perform a linear regression, and afterwards a multiple regression; should I delete all the negative values of the variables (i.e. keep only VAR>=0) I want to use at the beginning/ before running both regressions, or should I only delete the missing data which will be used for THE particular regression?
I would think that the first option is better, since the same amount of observations will remain for each type of regression.
Otherwise, the linear regression could be based on for instance 20,000 observations, and the multiple regression based on 14,000 observations...
Can someone confirm this?
Thanks in advance!
Related Posts with WHEN to delete missing/negative values?
Post-estimation tests or heteroskedasticity check to confirm normality of the residuals?Hi Stata users, I'd like to ask for your help. I read STATA manual pdf on post-estimation for mixed…
what is the base for -collapse- percent statistic with by()?The -collapse- command (https://www.stata.com/manuals/dcollapse.pdf) has a -percent- statistic that …
Calculating days elapsed between dates in Panel dataHello guys, I am trying to calculate days elapsed for each 'episode' between x and y values. For ex…
ImportError after specifying set python_exec in Python integrationHello If I set a certain environment's executable for python, initialize it and import a package in…
Correlating random effects to study interhospital variationHi All, Using data from ~2000 hospitals, I have identified patients who underwent 1 of 3 operations…
Subscribe to:
Post Comments (Atom)
0 Response to WHEN to delete missing/negative values?
Post a Comment