I am dealing with a dataset like this one
Code:
. input id year income
id year income
1. 9 1 10
2. 9 1 5
3. 9 1 7
4. 9 1 14
5. 9 1 18
6. 9 2 11
7. 9 2 6
8. 9 2 8
9. 9 2 15
10. 9 2 19
11. 10 1 3
12. 10 2 4
13. 11 1 1
14. 11 1 2
15. 11 2 4
16. 11 2 4
17. 12 1 2
18. 12 1 3
19. 12 2 3
20. 12 2 4
21. end
. list
+--------------------+
| id year income |
|--------------------|
1. | 9 1 10 |
2. | 9 1 5 |
3. | 9 1 7 |
4. | 9 1 14 |
5. | 9 1 18 |
|--------------------|
6. | 9 2 11 |
7. | 9 2 6 |
8. | 9 2 8 |
9. | 9 2 15 |
10. | 9 2 19 |
|--------------------|
11. | 10 1 3 |
12. | 10 2 4 |
13. | 11 1 1 |
14. | 11 1 2 |
15. | 11 2 4 |
|--------------------|
16. | 11 2 4 |
17. | 12 1 2 |
18. | 12 1 3 |
19. | 12 2 3 |
20. | 12 2 4 |
+--------------------+
.
end of do-file
.The final dataset should look like this :
Code:
+--------------------+
| id year income |
|--------------------|
1. | 9 1 54 |
2. | 9 2 19 |
3. | 10 1 3 |
4. | 10 2 4 |
5. | 11 1 3 |
|--------------------|
6. | 11 2 8 |
7. | 12 1 5 |
8. | 12 2 7 |
+--------------------+Code:
duplicates tag id year, gen(isdup)
Duplicates in terms of id year
. tab isdup
isdup | Freq. Percent Cum.
------------+-----------------------------------
0 | 93,230 95.93 95.93
1 | 2,128 2.19 98.12
2 | 930 0.96 99.08
3 | 492 0.51 99.58
4 | 205 0.21 99.79
5 | 126 0.13 99.92
7 | 40 0.04 99.97
9 | 10 0.01 99.98
10 | 11 0.01 99.99
12 | 13 0.01 100.00
------------+-----------------------------------
Total | 97,185 100.00Thanks in advance

0 Response to Problems with duplicates in a panel dataset
Post a Comment