Hello all,

I have what I believe should be a pretty simple problem but I'm getting tripped up, mostly because I'm not sure if there is a convention for coding this type of variable.

I am working with survey data. One of the questions is: "Are you registered in X country?" The options were: "Yes", "No" and "I am registered in another country". If they selected the "other" option, respondents were prompted to write in the other country.

Now in my dataset, I have two variables. The first is the registered variable with values 1 for "yes", 2 for "no", and blank if they chose "other" or if they just skipped the question. The second variable is the "other" variable, whose values are either the written response the respondents submitted, or blank. It is blank if either they answered Yes or No to the question (meaning the registered variable is not blank), or if they skipped the question entirely. Obviously, I want to be able to differentiate between blanks for having skipped the question and blanks for having answered the other part of it.

So my first question is, is there a convention for coding an "other" value? I would ideally like to have the registered variable be a simple 0/1 binary variable, but this "other" option is puzzling. It sort of makes sense to me to code it as 0, because I assume if the respondent chose that option, they are not indeed registered in Country X (I acknowledge this was probably a badly written question to begin with). But then how will I differentiate between the 0s from "no" and the 0s from "other"? Should I code it as a missing value for that first variable? Should I just code that variable as 1/2/3?

The second part of my question is how then to code the "Other" variable. I will leave the written responses as string values, but how do I separate the blanks due to skipping from the blanks due to a Yes or No response?

I hope this question makes sense, thanks in advance for any help!