Issue with an Independent variable consisting of Non Mutually Exclusive Categories, whereby one category perfectly predicts the outcome.

Hi Statalist,

I'm having a number of problems with my Probit model. I am attempting to model predictors of successful smoking cessation attempts, as such my dependent variable takes a value of one for ex-smokers (who have quit) and a value of zero for current smokers with a previous quit attempt.

One particular explanatory variable I want to examine is the reason given for a quit attempt. I want to see if the motive behind a quit attempt is associated with the probability of success. The survey allows respondents to select multiple reasons for a quit attempt, as such I could not use a categorical variable and instead have separate binary variables (Yes/No) for each quit reason, and use separate regressions for each one.

The independent 'Reason' variables therefore take a value of 1 if stated, and a value of 0 if not stated as a reason for trying to quit smoking.

However I am having trouble due to the design of the survey. The Current Smokers and Ex-Smokers were asked separate questions regarding their reasons for quitting, and Ex-Smokers were given more options to choose from (Pregnancy, "Own Motivation" and "Cannot Remember"). The problem I'm having is due to this "Own Motivation" variable. It supposedly refers to individuals who quit simply because they felt like it, and for no specific reason. However it was an option only available to ex-smokers and therefore perfectly predicts the dependent variable equal to one, with no natural interpretation.

For some reason, over 30% of Ex-Smokers selected "Own Motivation". When this variable is included in my analysis, the coefficients for all other reasons (Financial, Health, Family Pressure, Effect on Others) become extremely negative. When excluding Individuals who only selected "Own Motivation", the coefficients become far more reasonable.

I assume this is because the effect of "Own Motivation" is contained within the zero values for each independent variable. But it makes my the interpretation of my results unclear - Essentially the results tell me that people who quit for i.e. financial reasons are far more likely to relapse than those who do not.

Clearly something is very wrong with this approach, but I do not know how else to go about it, the only thing I can think of is to drop individuals who stated "Own Motivation" and nothing else, but this risks severe selection bias.

Does anybody have any suggestions about how to go about this?

Thanks in Advance

BJ Data Tech Solution

0 Response to Issue with an Independent variable consisting of Non Mutually Exclusive Categories, whereby one category perfectly predicts the outcome.

Post a Comment