I am struggling with something that I wanted to share. I have an unbalanced panel dataset. The panel has been described as follows:
xtset xwaveid wave, year
Basically, what I want to do is, for each wave (2006-2014):
- Look at each hhid, and find the number of different xwaveid's under that hhid for that wave.
- If there is only one xwaveid for that hhid then there is no problem for that wave.
- If, however, there are more than one xwaveid's for that hhid for that wave, then I only want to keep the xwaveid for that wave which is older, by looking at hgage.
So I want to remove duplicates at a yearly level and not over the entire dataset. I know what I want to do but I could not translate it into a coding language. I have below provided an example of my data. I would highly appreciate any help.
Best,
Merve
P.S. Basically while it is possible to create a longitudinal individual dataset, I am trying to construct an individual dataset so that individuals will be representative of distinct households over time.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long xwaveid float wave long hhid int hgage 108110 2009 81 36 400830 2008 1061 24 115868 2008 1061 26 109993 2007 1431 25 700012 2007 1431 30 109850 2006 1521 26 118285 2006 1561 36 118286 2006 1561 37 108023 2008 1631 38 108023 2010 2381 40 109850 2007 2441 27 115525 2006 4071 22 115525 2008 4171 24 115525 2007 4541 23 118285 2008 5991 38 118286 2008 5991 39 118286 2007 6471 38 118285 2007 6471 37 400075 2006 7801 26 100619 2006 7801 24 115131 2008 7971 28 108023 2009 8241 39 115131 2007 8641 27 114769 2009 8891 18 115131 2006 8901 26 115131 2010 9352 30 114322 2006 9881 35 108110 2014 100111 41 108110 2013 100121 40 108110 2012 100181 39 116083 2014 100631 26 114322 2007 10131 36 114322 2008 10331 37 1000282 2014 110251 29 115525 2014 110251 30 1000282 2013 110281 28 115525 2013 110281 29 109013 2012 110511 33 113069 2012 111501 50 106444 2014 115121 21 115131 2014 116851 34 115131 2013 117292 33 108902 2014 117391 29 114263 2013 118321 29 114263 2012 118751 28 114570 2014 119452 21 114570 2013 120381 20 114570 2012 120881 19 106945 2014 124391 35 800232 2014 128451 32 101763 2014 128451 34 800232 2013 129981 31 101763 2013 129981 33 800232 2012 130691 30 101763 2012 130691 32 114263 2014 130991 30 114569 2012 131211 21 1000305 2012 131211 62 115131 2012 133951 32 113075 2014 134071 70 1000276 2013 140491 28 104369 2013 140491 36 1000276 2012 140531 27 104369 2012 140531 35 119207 2014 140701 24 117956 2012 140761 30 119207 2013 140771 23 116020 2014 140771 25 114768 2013 140861 27 104369 2014 141031 37 1000276 2014 141031 29 800232 2009 14191 27 101763 2009 14191 29 109809 2014 142551 40 109808 2014 142551 40 109809 2013 142681 39 109808 2013 142681 39 118272 2012 143251 25 114768 2014 144221 28 119381 2013 144861 31 119380 2013 144861 40 119380 2014 144881 41 119381 2014 144881 32 118272 2013 145321 26 101411 2014 146651 30 116593 2006 14691 51 116594 2006 14691 48 101411 2013 147041 29 118272 2014 157451 27 107223 2009 15921 20 113341 2012 159371 23 113341 2013 159831 24 113341 2014 160171 25 101979 2008 16131 18 114769 2008 16891 17 101979 2006 16961 16 111422 2011 170021 25 600377 2011 170021 24 110312 2012 170021 84 116423 2011 170031 49 end format %ty wave label values hgage FHGAGE
0 Response to Removing duplicates within years in panel data
Post a Comment