Hi all,




To provide some context, here's a small sample of my data:
input float(haze age ac_behavior purifier_behavior) long housing float female
0 21 1 0 1 1
1 21 1 1 1 1
0 58 1 0 3 1
1 58 1 0 3 1
0 47 1 0 3 1
1 47 1 0 3 1
0 43 0 0 2 0
1 43 1 1 2 0
0 35 1 0 3 0
1 35 0 1 3 0
0 52 1 0 3 1
1 52 0 0 3 1
0 54 1 1 5 0
1 54 1 1 5 0
0 61 1 0 5 0
1 61 1 1 5 0
0 47 1 1 2 0
1 47 1 0 2 0
A brief explanation of the variables:
age is continuous
female = 1 if female, =0 if male
ac_behavior =1 if air-con is used, = 0 if unused
purifier_behavior = 1 if purifier is used, =0 if unused
housing is a categorical variable (can take on labels from 1 to 6 according to housing type). Housing type is meant to be a proxy for household income level since income data is not available.
haze = 1 if hazy, = 0 if normal. Each participant will answer the questions on behavior twice, first responding to how they will use their appliances under normal weather conditions, and second to how they will use their appliances under hazy conditions. This means that while I have 622 observations in my dataset, there are only 311 participants in the study.

So my intention is first to investigate whether households' use of appliances is related to haze. For a start, I ran a logit regression with ac_behavior as dependent variable and haze as a regressor. Here are the results. It tells me that all other factors constant, households are 1.77 times more like likely to use air-con under hazy conditions as compared to normal, and the relation is statistically significant. Is this the right line of thought?

Array
Next, I wish to investigate whether households in different kinds of dwellings (and by proxy different income groups) respond to haze differently by comparing their appliance use with and without haze. I am not sure how I can go about doing that.
I am thinking about running another logit regression. For instance, a code like this:



Code:
 
logistic ac_behavior i.housing haze, r
Any advice on how to proceed would be much appreciated.