First, I'd like to say thank you to all of these helpful users from which I learned Stata so much from scratch. I don't know much yet and my first post is the best proof of that.
Going back to the topic, I have a census file containing the data of household members. I would like to carry out a logistic regression of the families of these households in terms of the occurrence of divorce. I have variables for the household id (hhid), id of family in this household hhfamilynumber (first family,second family,...), the id for type of relation between the head of whole hh and other members (reltohhh) and relation of family members to the head of the particular family in this household (reltohhfamily). I got much more than that but those let me to connect and identify people to each other within household and within families who live in the same hh.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double(hhid hhperid sex reltohhh hhfamilynumber reltohhfamily yearofstud)
12 1 1 1 1 1 11
12 2 2 2 1 2 11
25 1 1 1 1 1 7
25 2 2 2 1 2 7
33 1 1 1 1 1 7
33 2 2 2 1 2 7
33 4 2 5 2 2 12
33 3 1 4 2 1 12
56 2 2 2 1 2 12
56 1 1 1 1 1 12
58 4 2 2 1 2 15
62 2 1 2 1 1 7
62 1 2 1 1 2 11
95 2 2 2 1 2 12
95 1 1 1 1 1 11
101 2 2 2 1 2 17
103 2 2 2 1 2 7
103 1 1 1 1 1 11
132 2 1 4 1 1 13
132 3 2 5 1 2 13
138 1 1 1 1 1 14
138 2 2 2 1 2 14
138 3 2 4 2 2 17
168 1 1 1 1 1 8
168 2 2 2 1 2 8
172 1 1 1 1 1 8
172 2 2 2 1 2 10
173 1 1 1 1 1 11
173 2 2 2 1 2 11
189 2 2 2 1 2 12
189 1 1 1 1 1 11
193 3 1 1 1 1 11
193 4 2 2 1 2 11
197 2 2 2 1 2 11
201 2 2 1 1 1 11
208 2 2 2 1 2 11
208 1 1 1 1 1 11
210 2 2 2 1 2 11
210 1 1 1 1 1 13
223 1 1 1 1 1 13
223 2 2 2 1 2 10
224 2 2 2 1 2 11
224 1 1 1 1 1 11
226 2 2 2 1 2 12
226 1 1 1 1 1 16
228 2 1 2 1 1 13
228 1 2 1 1 2 11
254 1 1 7 1 1 11
254 2 2 7 1 2 11
265 1 1 1 1 1 11
265 2 2 2 1 2 13
270 1 1 1 1 1 8
270 2 2 2 1 2 13
271 2 2 2 1 2 12
271 1 1 1 1 1 15
278 1 1 1 1 1 17
278 4 2 2 1 2 17
278 2 2 7 2 1 -8
278 3 1 7 2 2 -8
282 1 1 1 1 1 11
282 2 2 2 1 2 11
287 2 2 2 1 2 7
287 1 1 1 1 1 7
289 2 2 2 1 2 8
289 1 1 1 1 1 11
290 1 1 1 1 1 13
290 2 2 2 1 2 11
300 2 2 2 1 2 7
300 1 1 1 1 1 7
303 1 1 1 1 1 8
303 2 2 2 1 2 8
306 1 1 1 1 1 13
306 2 2 2 1 2 13
312 1 1 1 1 1 12
312 2 2 2 1 2 12
328 1 1 1 1 1 11
328 2 2 2 1 2 8
344 2 1 2 1 2 14
344 1 2 1 1 1 12
345 2 2 2 1 2 8
345 1 1 1 1 1 11
361 1 1 1 1 1 18
361 2 2 2 1 2 16
362 4 2 1 1 2 16
384 1 1 1 1 1 7
384 2 2 2 1 2 7
387 2 2 2 1 2 12
387 1 1 1 1 1 11
399 1 1 1 1 1 10
399 2 2 2 1 2 8
401 2 2 7 1 2 8
401 1 1 7 1 1 7
409 1 1 1 1 1 12
409 2 2 2 1 2 18
424 1 1 1 1 1 11
424 2 2 2 1 2 11
429 3 2 4 1 4 13
429 2 2 2 1 2 10
429 1 1 1 1 1 7
433 2 1 2 1 1 14
end
label values sex PLEC_ALL
label def PLEC_ALL 1 "man", modify
label def PLEC_ALL 2 "woman", modify
label values reltohhh fC4
label def fC4 1 "head", modify
label def fC4 2 "husband/wife", modify
label def fC4 4 "son/daughter", modify
label def fC4 5 "son-in-law/daughter-in-law", modify
label def fC4 7 "father/mother/father-in-law/mother-in-law", modify
label values hhfamilynumber fC5
label values reltohhfamily fC6
label values yearofstud lata_nauki_2011
label def lata_nauki_2011 -8 "Missing data", modify
I came up with the idea that the easiest way to carry out a logistic regression would be to add the data about the members (or at least the spouse) to the observation of the head of a given family in hh. This way, I could create multiple 0-1 variables for the husband and separate ones for the wife (like different education level for husband and wife, which one have higher etc) and conduct regression. To achieve this I wanted to merge husband and wife data (2 observations) into one using reshape:
reshape wide [all dataset variables], i (hhid) j (hhperid)
butI get an error
{ required
r (100);
I have read that you don't recommend this approach here (using reshape) but alternative to this is creating multiple 40-80+ new variables to separate husband and wife observations data by using egen and sortby which functions I'm not fully proficient at.
I hope you give me some tips and solutions so i could use them in this case, understand and learn from it for the better future.
Best Regards,
Karol
0 Response to Merging household census data for logistic regression
Post a Comment