I am dealing with a dataset like this one
Code:
. input id year income id year income 1. 9 1 10 2. 9 1 5 3. 9 1 7 4. 9 1 14 5. 9 1 18 6. 9 2 11 7. 9 2 6 8. 9 2 8 9. 9 2 15 10. 9 2 19 11. 10 1 3 12. 10 2 4 13. 11 1 1 14. 11 1 2 15. 11 2 4 16. 11 2 4 17. 12 1 2 18. 12 1 3 19. 12 2 3 20. 12 2 4 21. end . list +--------------------+ | id year income | |--------------------| 1. | 9 1 10 | 2. | 9 1 5 | 3. | 9 1 7 | 4. | 9 1 14 | 5. | 9 1 18 | |--------------------| 6. | 9 2 11 | 7. | 9 2 6 | 8. | 9 2 8 | 9. | 9 2 15 | 10. | 9 2 19 | |--------------------| 11. | 10 1 3 | 12. | 10 2 4 | 13. | 11 1 1 | 14. | 11 1 2 | 15. | 11 2 4 | |--------------------| 16. | 11 2 4 | 17. | 12 1 2 | 18. | 12 1 3 | 19. | 12 2 3 | 20. | 12 2 4 | +--------------------+ . end of do-file .
The final dataset should look like this :
Code:
+--------------------+ | id year income | |--------------------| 1. | 9 1 54 | 2. | 9 2 19 | 3. | 10 1 3 | 4. | 10 2 4 | 5. | 11 1 3 | |--------------------| 6. | 11 2 8 | 7. | 12 1 5 | 8. | 12 2 7 | +--------------------+
Code:
duplicates tag id year, gen(isdup) Duplicates in terms of id year . tab isdup isdup | Freq. Percent Cum. ------------+----------------------------------- 0 | 93,230 95.93 95.93 1 | 2,128 2.19 98.12 2 | 930 0.96 99.08 3 | 492 0.51 99.58 4 | 205 0.21 99.79 5 | 126 0.13 99.92 7 | 40 0.04 99.97 9 | 10 0.01 99.98 10 | 11 0.01 99.99 12 | 13 0.01 100.00 ------------+----------------------------------- Total | 97,185 100.00
Thanks in advance

0 Response to Problems with duplicates in a panel dataset
Post a Comment