Hi,

I am trying to create a new variable adv_outcome. I have 5 pre-existing variables I am interested in sad, angry, grief, disappoint, hope. These categories are not mutually exclusive (i.e. some patients will be sad AND angry). These categories are all binary 0=no, 1=yes.

I tried using
gen adv_outcome= 0 if sad==0 & angry==0 & grief==0 & disappoint==0 & hope==0
replace adv_outcome=1 if sad==1
replace adv_outcome=2 if angry==1
replace adv_outcome=3 if grief==1
replace adv_outcome=4 if disappoint==1
replace adv_outcome=5 if hope==1

The issue when I do this is that as it creates each new category it "takes" the observations from the previous category instead of adding them to the tally for both. The reason I want this adv_outcome variable is so that I can compare it to various other factors such as age, gender, rurality and have it show me what % of men end up experiencing each adverse outcome rather than creating 6 separate tables for each patient-factor variable. I do have an any_outcome variable that was created if any of the 5 (sad, angry, grief, disappoint, hope.) occurred which seems to be working fine, but I'd like to see which outcomes were most common.

This is what I would like the crosstabs to look like: (%'s made up for example purpose, obviously just want the numbers not the words but wanted to convey what I want as clearly as possible so put the words in for this)
Metro Regional Rural
none 12% experienced none 24% experienced none 5% experienced none
sad 36% experienced sad 47% experienced sad 60% experienced sad
angry 40% experienced angry 10% experienced angry 40% experienced angry
grief 65% experienced grief 60% experienced grief 30% experienced grief
dissapoint 35% experienced disappointment 5% experienced disappointment 90% experienced disappointment
hope 80% experienced hope 70% experienced hope 67% experienced hope
This then brings me to my next issue, the dataset is 2,000 people (1,000 men, 1,000 not-men). If the patients fit in with multiple groups, there could end up being a total of 5,000 in each column assuming that all 1,000 men felt all 5 emotions, so would percentages then be calculated /5,000 instead of /2,000?

Is it easier to just run each table separately and manually type the numbers into a normal table? Am I missing something super obvious? I feel like I've been staring at it all day making zilch progress.

Thanks, J