- HHID: Id number of the household
- year: survey year
- fuel_id: list of fuel sources used by the household, 1=firewood, 2=dung, 3=crop residue, 4=kerosene, 5=LPG, 6=charcoal, 7=solar, 8=electricity.
- The shaded lines below are used to show a household who combine both clean and dirty sources for cooking purposes in the same year.
- cfuel: Category of the cooking fuel sources in each household in each year: clean=1 (it comprises fuel_id 4,5,7,8), dirty=2 (it consists of fuel_id 1,2,3,6), and mixed=3 (which combines both clean and dirty sources).
gen cfuel=.
replace cfuel=1 if fuelid==4|fuelid==5|fuelid==7|fuelid==8
replace cfuel=2 if fuelid==1|fuelid==2|fuelid==3|fuelid==6
replace cfuel=3 if fuelid==4&fuelid==1
replace cfuel=3 if fuelid==4&fuelid==2
replace cfuel=3 if fuelid==4&fuelid==3
replace cfuel=3 if fuelid==4&fuelid==6
replace cfuel=3 if fuelid==5&fuelid==1
replace cfuel=3 if fuelid==5&fuelid==2
replace cfuel=3 if fuelid==5&fuelid==3
replace cfuel=3 if fuelid==5&fuelid==6
replace cfuel=3 if fuelid==7&fuelid==1
replace cfuel=3 if fuelid==7&fuelid==2
replace cfuel=3 if fuelid==7&fuelid==3
replace cfuel=3 if fuelid==7&fuelid==6
replace cfuel=3 if fuelid==8&fuelid==1
replace cfuel=3 if fuelid==8&fuelid==2
replace cfuel=3 if fuelid==8&fuelid==3
replace cfuel=3 if fuelid==8&fuelid==6
label var cfuel "category of fuel used for cooking purposes"
label define cfuel 1"clean" 2"dirty" 3"mixed"
label value cfuel cfuel
The first two ‘replace lines’ go well with respect to coding into "CLEAN" and "DIRTY" sources. But it does not properly work for the "MIXED" case. For instance, HHID 1021000102 indicates that the household combines both charcoal and LPG during 2009. I want Stata to label this household as a "MIXED" energy user. However, it simply categorized it into "DIRTY" looking at charcoal and "CLEAN" looking for LPG. This is wrong because the household is combining both CLEAN and DIRTY sources for cooking and thus should have been categorized as "MIXED". While the "cfuel" column is what I got now from Stata, "cfuel (expected)" is the result I am expecting through proper coding.
HHID | Year | fuel_id | cfuel | cfuel (expected) |
1013000206 | 2009 | Kerosene | 1 | 1 |
1021000102 | 2009 | Charcoal | 2 | 3 |
1021000102 | 2009 | LPG | 1 | |
1021000108 | 2009 | Charcoal | 2 | 2 |
1021000109 | 2009 | Charcoal | 2 | 2 |
1021000110 | 2009 | Charcoal | 2 | 3 |
1021000110 | 2009 | Kerosene | 1 | |
1021000113 | 2009 | Charcoal | 2 | 2 |
1021000201 | 2009 | firewood | 2 | 3 |
1021000201 | 2009 | Kerosene | 1 |
0 Response to Technical help on Variable categorization
Post a Comment