Dear all, I need your technical support on categorizing a variable for the case explained below.
  • HHID: Id number of the household
  • year: survey year
  • fuel_id: list of fuel sources used by the household, 1=firewood, 2=dung, 3=crop residue, 4=kerosene, 5=LPG, 6=charcoal, 7=solar, 8=electricity.
  • The shaded lines below are used to show a household who combine both clean and dirty sources for cooking purposes in the same year.
  • cfuel: Category of the cooking fuel sources in each household in each year: clean=1 (it comprises fuel_id 4,5,7,8), dirty=2 (it consists of fuel_id 1,2,3,6), and mixed=3 (which combines both clean and dirty sources).
I run the following commands to meet my desired result.
gen cfuel=.
replace cfuel=1 if fuelid==4|fuelid==5|fuelid==7|fuelid==8
replace cfuel=2 if fuelid==1|fuelid==2|fuelid==3|fuelid==6

replace cfuel=3 if fuelid==4&fuelid==1
replace cfuel=3 if fuelid==4&fuelid==2
replace cfuel=3 if fuelid==4&fuelid==3
replace cfuel=3 if fuelid==4&fuelid==6

replace cfuel=3 if fuelid==5&fuelid==1
replace cfuel=3 if fuelid==5&fuelid==2
replace cfuel=3 if fuelid==5&fuelid==3
replace cfuel=3 if fuelid==5&fuelid==6

replace cfuel=3 if fuelid==7&fuelid==1
replace cfuel=3 if fuelid==7&fuelid==2
replace cfuel=3 if fuelid==7&fuelid==3
replace cfuel=3 if fuelid==7&fuelid==6

replace cfuel=3 if fuelid==8&fuelid==1
replace cfuel=3 if fuelid==8&fuelid==2
replace cfuel=3 if fuelid==8&fuelid==3
replace cfuel=3 if fuelid==8&fuelid==6

label var cfuel "category of fuel used for cooking purposes"
label define cfuel 1"clean" 2"dirty" 3"mixed"
label value cfuel cfuel

The first two ‘replace lines’ go well with respect to coding into "CLEAN" and "DIRTY" sources. But it does not properly work for the "MIXED" case. For instance, HHID 1021000102 indicates that the household combines both charcoal and LPG during 2009. I want Stata to label this household as a "MIXED" energy user. However, it simply categorized it into "DIRTY" looking at charcoal and "CLEAN" looking for LPG. This is wrong because the household is combining both CLEAN and DIRTY sources for cooking and thus should have been categorized as "MIXED". While the "cfuel" column is what I got now from Stata, "cfuel (expected)" is the result I am expecting through proper coding.



HHID Year fuel_id cfuel cfuel (expected)
1013000206 2009 Kerosene 1 1
1021000102 2009 Charcoal 2 3
1021000102 2009 LPG 1
1021000108 2009 Charcoal 2 2
1021000109 2009 Charcoal 2 2
1021000110 2009 Charcoal 2 3
1021000110 2009 Kerosene 1
1021000113 2009 Charcoal 2 2
1021000201 2009 firewood 2 3
1021000201 2009 Kerosene 1