Hello,

I was wondering when it is best to clean the data (i.e. delete missing or negative values) when performing different types of regression analysis?
If I want to first perform a linear regression, and afterwards a multiple regression; should I delete all the negative values of the variables (i.e. keep only VAR>=0) I want to use at the beginning/ before running both regressions, or should I only delete the missing data which will be used for THE particular regression?

I would think that the first option is better, since the same amount of observations will remain for each type of regression.
Otherwise, the linear regression could be based on for instance 20,000 observations, and the multiple regression based on 14,000 observations...
Can someone confirm this?

Thanks in advance!