I have a household panel dataset with some observations being fully captured by other observations. Households can split off over time and are then reassigned their original household ID. Thus, a household ID can be assigned to several observations (where household ID is only the wave specific ID). I reshaped the panel and kept only the relevant variables which I hope illustrates the case.
For instance, obs 3, 6 or 10 are redundant. Ideally I would like to keep observations that are included in others but only as split-offs, though. For instance, obs 53 is captured by obs 51 and 52, but both of them are split-offs in the third period, for which obs 53 does not have information. In such cases it might make more sense to keep it as an "original" household.
The "rule" derives from the examples: Observations should be dropped if 1) their sequence of hhids appears in another observation, 2) (ideally but I do not know how feasible it is) this other observation is an original household in the period following the one for which the to be dropped observation does not contain a value anymore.
I tried to use collapse, to create duplicates or to count missing values, but none really works. I would appreciate any suggestion. And I already contacted the data provider about the panel composition, but did not receive any answer.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double obs str16(hhid1 hhid2) double split_off2 str16 hhid3 double split_off3 str16 hhid4 double split_off4 1 "01010140020171" "0101014002017101" 1 "0001-001" 1 "0001-001" 1 2 "01010140020171" "0101014002017101" 1 "0001-001" 1 "0001-004" 2 3 "01010140020171" "0101014002017101" 1 "" . "" . 4 "01010140020284" "0101014002028401" 1 "0002-001" 1 "0002-001" 1 5 "01010140020297" "0101014002029701" 1 "0003-001" 1 "0003-001" 1 6 "01010140020297" "0101014002029701" 1 "" . "" . 7 "01010140020297" "0101014002029704" 2 "" . "" . 8 "01010140020409" "0101014002040901" 1 "0005-001" 1 "0005-001" 1 9 "01010140020471" "0101014002047101" 1 "0006-001" 1 "" . 10 "01010140020471" "" . "" . "" . 11 "01010140020551" "0101014002055101" 1 "0007-001" 1 "0007-001" 1 12 "01010140020761" "0101014002076101" 1 "0008-001" 1 "0008-001" 1 13 "01010140020762" "0101014002076201" 1 "0009-001" 1 "0009-001" 1 14 "01020030030004" "0102003003000401" 1 "0010-001" 1 "0010-001" 1 15 "01020030030022" "0102003003002201" 1 "0011-001" 1 "0012-001" 1 16 "01020030030022" "0102003003002201" 1 "0011-001" 1 "0012-003" 2 17 "01020030030022" "0102003003002201" 1 "0011-004" 2 "" . 18 "01020030030140" "0102003003014001" 1 "0012-001" 1 "0013-001" 1 19 "01020030030161" "0102003003016101" 1 "0013-001" 1 "0014-001" 1 20 "01020030030174" "0102003003017401" 1 "0014-001" 1 "0015-001" 1 21 "01020030030174" "0102003003017407" 2 "0015-001" 1 "0017-001" 1 22 "01020030030200" "0102003003020001" 1 "0016-001" 1 "0018-001" 1 23 "01020030030430" "0102003003043001" 1 "0017-001" 1 "0019-001" 1 24 "01020030030430" "0102003003043001" 1 "" . "" . 25 "01020030030479" "0102003003047901" 1 "0018-001" 1 "0020-001" 1 26 "01020170030001" "0102017003000101" 1 "0019-001" 1 "" . 27 "01020170030001" "0102017003000101" 1 "0019-003" 2 "" . 28 "01020170030001" "0102017003000104" 2 "0020-001" 1 "" . 29 "01020170030017" "0102017003001701" 1 "0021-001" 1 "" . 30 "01020170030022" "0102017003002201" 1 "0022-001" 1 "" . 31 "01020170030022" "0102017003002201" 1 "" . "" . 32 "01020170030048" "0102017003004801" 1 "0023-001" 1 "" . 33 "01020170030100" "0102017003010001" 1 "0024-001" 1 "" . 34 "01020170030209" "0102017003020901" 2 "0025-001" 1 "" . 35 "01020170030209" "" . "0025-001" 1 "" . 36 "01020170030241" "0102017003024101" 1 "0026-001" 1 "" . 37 "01020170030241" "0102017003024101" 1 "" . "" . 38 "01020170030246" "0102017003024601" 1 "0027-001" 1 "" . 39 "01030130040161" "0103013004016101" 1 "0028-001" 1 "" . 40 "01030130040219" "0103013004021901" 1 "0029-001" 1 "" . 41 "01030130040259" "0103013004025901" 1 "0030-001" 1 "" . 42 "01030130040346" "0103013004034601" 1 "0031-001" 1 "" . 43 "01030130040468" "0103013004046801" 1 "0032-001" 1 "" . 44 "01030130040685" "0103013004068501" 1 "0033-001" 1 "" . 45 "01030130040739" "0103013004073901" 1 "0034-001" 1 "" . 46 "01030130040739" "0103013004073901" 1 "0034-003" 2 "" . 47 "01030130040739" "0103013004073901" 1 "" . "" . 48 "01030130040745" "0103013004074501" 1 "0035-001" 1 "" . 49 "01030133010068" "0103013301006801" 1 "0036-001" 1 "" . 50 "01030133010092" "0103013301009201" 1 "0037-001" 1 "" . 51 "01030133010175" "0103013301017501" 1 "0038-001" 2 "" . 52 "01030133010175" "0103013301017501" 1 "0038-002" 2 "" . 53 "01030133010175" "0103013301017501" 1 "" . "" . 54 "01030133010188" "0103013301018801" 1 "0039-001" 1 "" . 55 "01030133010188" "0103013301018801" 1 "0039-004" 2 "" . 56 "01030133010188" "0103013301018801" 1 "" . "" . 57 "01030133010188" "0103013301018803" 2 "0040-001" 1 "" . 58 "01030133010300" "0103013301030001" 1 "0041-002" 1 "" . 59 "01030133010300" "0103013301030001" 1 "0041-006" 2 "" . 60 "01030133010300" "0103013301030001" 1 "" . "" . 61 "01030133010322" "0103013301032201" 1 "0042-001" 1 "" . 62 "01030133010411" "0103013301041101" 1 "0043-001" 1 "" . 63 "01030133010411" "0103013301041101" 1 "0043-002" 2 "" . 64 "01030133010411" "0103013301041102" 2 "0044-001" 1 "" . 65 "01030133010652" "0103013301065201" 1 "0045-001" 1 "" . 66 "01040173040004" "0104017304000401" 1 "0046-001" 1 "" . 67 "01040173040004" "0104017304000401" 1 "0046-002" 2 "" . 68 "01040173040004" "0104017304000401" 1 "" . "" . 69 "01040173040017" "0104017304001701" 1 "0047-001" 1 "" . 70 "01040173040022" "0104017304002201" 1 "0048-001" 1 "" . 71 "01040173040022" "0104017304002201" 1 "0048-002" 2 "" . 72 "01040173040022" "" . "0048-001" 1 "" . 73 "01040173040034" "0104017304003401" 1 "0049-001" 1 "" . 74 "01040173040034" "0104017304003406" 2 "" . "" . 75 "01040173040034" "0104017304003407" 2 "0051-002" 2 "" . 76 "01040173040041" "0104017304004102" 2 "0052-001" 1 "" . 77 "01040173040041" "" . "" . "" . 78 "01040173040086" "0104017304008601" 1 "0053-001" 1 "" . 79 "01040173040086" "" . "0053-001" 1 "" . 80 "01040173040092" "0104017304009201" 1 "0054-001" 1 "" . 81 "01040173040094" "0104017304009401" 1 "0055-001" 1 "" . 82 "01040173040094" "0104017304009402" 2 "0056-001" 1 "" . 83 "01040310010030" "0104031001003001" 1 "0057-001" 1 "" . 84 "01040310010102" "0104031001010201" 1 "0058-001" 1 "" . 85 "01040310010174" "0104031001017402" 1 "0059-001" 1 "" . 86 "01040310010174" "0104031001017402" 1 "0059-002" 2 "" . 87 "01040310010174" "0104031001017403" 2 "0060-001" 1 "" . 88 "01040310010174" "" . "" . "" . 89 "01040310010180" "0104031001018001" 1 "0061-001" 1 "" . 90 "01040310010462" "0104031001046201" 1 "0062-001" 1 "" . 91 "01040310010482" "0104031001048201" 1 "0063-001" 1 "" . 92 "01040310010482" "" . "" . "" . 93 "01040310010745" "0104031001074501" 1 "0064-001" 1 "" . 94 "01040310010745" "0104031001074502" 2 "0065-001" 1 "" . 95 "01040310010745" "" . "0064-001" 1 "" . 96 "01040310011128" "0104031001112801" 1 "0066-001" 1 "" . 97 "01040310011128" "0104031001112801" 1 "" . "" . 98 "01040310011128" "0104031001112804" 2 "0067-001" 1 "" . 99 "01040380030347" "0104038003034701" 1 "0068-001" 1 "" . 100 "01040380030396" "0104038003039601" 1 "0069-001" 1 "" . end label values split_off2 ha_10 label values split_off3 ha_10 label values split_off4 ha_10 label def ha_10 1 "ORIGINAL HOUSEHOLD", modify label def ha_10 2 "SPLIT-OFF HOUSEHOLD", modify
0 Response to Drop embedded observations
Post a Comment