The sampling rotation is such that in any 2 year period, the same individual answers the basic survey in 2 consecutive months, then again in the same 2 consecutive months the following year.
When answering the final basic survey they are also administered the special survey.
Their PIDs will match across the surveys, so the max times a PID should appear in any Year/Month combination is 2, and the max overall is 5.
However, there are some individuals with duplicates, because xtdescribe shows the max observations per pid is 8, not 5.
Data from the special survey has a variable special = 1, and the basic survey is assigned 0
The following code shows me how many duplicates are in each yearmonth:
Code:
Duplicates tag yearmonth, generate(temp) tab temp
And the following code would drop all duplicates:
Code:
duplicates drop pid yearmonth, force
- List the number of duplicates for special AND basic in each yearmonth
- Delete any duplicates for special and basic
0 Response to Need Help Handling Duplicates When Making Panel Data Set
Post a Comment