I have the following duplication of data in id & p_id, which I need to resolve. This data has paired up couples where they exist. As an example of the duplication problem, I have included data on one couple - some variables are included for both in a couple.
While not evident here, we sometimes observe an id with multiple partners (p_ids) that remains in the survey for many waves. We also find p_ids may drop out of the survey at the end of their relationship with a given id. In such cases, the id has more observations than some p_ids.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte wave int(age p_age) byte(marstat p_marstat) float length byte(educ p_educ) long(inc p_inc) byte lifesat double volunteer byte fired 14 15 1 47 44 1 1 16 5 8 79958 79958 4 . . 14 15 2 48 45 1 1 16 5 8 108000 108000 7 0 1 14 15 3 49 46 1 1 16 5 8 139000 139000 7 0 1 14 15 4 50 47 1 1 16 5 8 141718 141718 8 0 1 14 15 5 51 48 1 1 16 5 8 140723 140723 7 0 1 14 15 6 52 49 1 1 16 5 8 113008 113008 8 0 1 14 15 7 53 50 1 1 16 5 8 122832 122832 8 0 1 14 15 8 54 51 1 1 16 5 5 137991 137991 6 0 1 14 15 9 55 52 1 1 16 5 5 137140 137140 8 0 2 14 15 10 56 53 1 1 16 5 5 85330 85330 8 0 2 14 15 11 57 54 1 1 16 5 5 51148 51148 7 0 1 14 15 12 58 55 1 1 16 5 5 49378 49378 5 0 1 14 15 13 59 56 1 1 16 5 5 42822 42822 8 0 1 14 15 14 60 57 1 1 16 5 5 48164 48164 7 0 1 14 15 15 61 58 1 1 16 5 5 57011 57011 8 0 1 14 15 16 62 59 1 1 16 5 5 57930 57930 9 0 1 14 . 18 64 . 3 . 0 5 . 0 . 8 1 1 15 14 1 44 47 1 1 16 8 5 79958 79958 7 . . 15 14 2 45 48 1 1 16 8 5 108000 108000 8 0 1 15 14 3 46 49 1 1 16 8 5 139000 139000 7 0 2 15 14 4 47 50 1 1 16 8 5 141718 141718 8 0 1 15 14 5 48 51 1 1 16 8 5 140723 140723 7 0 1 15 14 6 49 52 1 1 16 8 5 113008 113008 8 0 1 15 14 7 50 53 1 1 16 8 5 122832 122832 6 0 1 15 14 8 51 54 1 1 16 5 5 137991 137991 7 0 1 15 14 9 52 55 1 1 16 5 5 137140 137140 7 0 1 15 14 10 53 56 1 1 16 5 5 85330 85330 7 0 1 15 14 11 54 57 1 1 16 5 5 51148 51148 7 0 1 15 14 12 55 58 1 1 16 5 5 49378 49378 8 0 1 15 14 13 56 59 1 1 16 5 5 42822 42822 7 0 1 15 14 14 57 60 1 1 16 5 5 48164 48164 9 0 1 15 14 15 58 61 1 1 16 5 5 57011 57011 7 0 1 15 14 16 59 62 1 1 16 5 5 57930 57930 8 0 1 15 . 17 60 . 3 . 0 5 . 51000 . 7 0 1 15 . 18 61 . 3 . 0 5 . 50000 . 8 0 1 end
Code:
local variables id p_id age marstat length educ inc lifesat volunteer fired local filename allwaves clear save "`savingdir'/`filename'", replace emptyok forvalues wave=1/18 { local waveprefix = word(c(alpha), `wave') quietly use "`origdatadir'/Combined_`waveprefix'180c.dta", clear rename `waveprefix'* * isvar `variables' keep `r(varlist)' generate byte wave = `wave' display "Wave `wave' (`waveprefix') - kept `thiswave' append using "`savingdir'/`filename'" save "`savingdir'/`filename'", replace } // partner data tempfile partners drop if p_id=="" drop id rename * p_* rename (p_id p_wave) (id wave) save `partners' // merge use "`savingdir'/`filename'" merge 1:1 id wave using `partners', nolabel assert (p_id!="" & _merge==3) | (p_id=="" & _merge==1) drop _merge // make panel destring id p_id, replace sort id wave xtset id wave save "`savingdir'/`filename'", replace order id wave
0 Response to Duplication of ID variables in panel data
Post a Comment